**Spatiotemporal Modeling of the Smart City Residents' Activity with Multi-Agent Systems**

#### **Robert Olszewski 1, Piotr Pałka 2,\*, Agnieszka Turek 1, Bogna Kietli ´nska 3, Tadeusz Płatkowski <sup>4</sup> and Marek Borkowski <sup>2</sup>**


Received: 30 April 2019; Accepted: 16 May 2019; Published: 19 May 2019

**Abstract:** The article proposes the concept of modeling that uses multi-agent systems of mutual interactions between city residents as well as interactions between residents and spatial objects. Adopting this perspective means treating residents, as well as buildings or other spatial objects, as distinct agents that exchange multifaceted packages of information in a dynamic and non-linear way. The exchanged information may be reinforced or diminished during the process, which may result in changing the social activity of the residents. Utilizing Latour's actor–network theory, the authors developed a model for studying the relationship between demographic and social factors, and the diversified spatial arrangement and the structure of a city. This concept was used to model the level of residents' trust spatiotemporally and, indirectly, to study the level of social (geo)participation in a smart city. The devised system, whose test implementation as an agent-based system was done in the GAMA: agent-based, spatially explicit, modeling and simulation platform, was tested on both model and real data. The results obtained for the model city and the capital of Poland, Warsaw, indicate the significant and interdisciplinary analytical and scientific potential of the authorial methodology in the domain of geospatial science, geospatial data models with multi-agent systems, spatial planning, and applied social sciences.

**Keywords:** multi-agent systems; smart city development; spatiotemporal modeling; actor–network theory; geoparticipation; social interactions

#### **1. Introduction**

Many different disciplines use multi-agent systems as a research tool. One of them is the analysis of social relations in the city, as well as the interaction between residents and spatial objects (the background of the research). The open problem to address is an analysis of various factors that influence changes in the level of residents' social engagement in the process of social participation; above all, changes in the level of mutual trust amongst residents and their trust in social institutions. The multi-agent system (in further parts of the paper, the authors use MAS abbreviation) that models the process of changing the social engagement of residents, proposed by the authors of the article, is the main contribution to the scientific research integrating applied social sciences, geoinformation technologies, and multi-agent systems.

In his 2007 article [1], Michael F. Goodchild introduces the concept of social assembling of spatial information by users identified as specific human agents. Implementing this idea brought about the rapid development of the so-called VGI (volunteered geographic information), manifested by, e.g., social (crowdsourcing) creation of Open Street Map by over 3 million active users around the world.

The purpose of this article is to develop the concept of smart cities' residents as active urban sensors represented by agents in the MAS. Consequently, city residents are considered elements of a specific geospatial multi-agent system. Mutual interactions of the residents, between the residents, as well as the impact the "human agents" have on spatial objects in the long-term influences the activeness of the residents.

It should be emphasized that for this concept, the crucial assumption is that mutual interactions between residents and spatial objects are characteristic of a complex multi-agent system in a smart city. The non-linear exchange of packages of information between individual elements of the system, under special conditions, leads to reinforcing information and "activating" residents in the process of social participation. It is assumed that the residents (represented by agents) also interact with different features of the city, which tend to modify their trust.

In recent decades, the crisis of participatory democracy has been particularly severe in urban centers and areas subject to urbanization. Its outcome is the weakening of the sense that the residents have a real impact on co-creating the vision for a city's development, revitalization of the neglected districts, or spatial order. When interpreting the term "the right to the city," David Harvey [2] emphasizes that it is not only the right to access its resources, but also the right to decide jointly on the direction of the city's development. A smart and sustainable city engages all of its residents in the most critical decision-making processes, making the process of creating spatial order more social, and encouraging the development of participatory and deliberative democracy. An increase in mutual trust among residents and their trust in public institutions are of crucial importance to stimulating the activity. Therefore, the goal of the authors is to model complex social interactions between the urban residents and to establish the level of trust, sense of identity, willingness to participate in the social (geo)participation processes, and the dynamics of changes over time. The devised model has been tested not only on the example of the "model city," but also on the real urban agglomeration of Warsaw, Poland.

The authors intend to develop a concept of a multi-agent decision support system that makes use of game theory, multi-agent systems, and market programming models to support the "weakest link" of the asymmetric urban network's triangle of "municipal authorities" (politics), "business" (urban developers, industrial investments, etc.), and "residents". So that the atomized individuals are transformed into a cooperative urban community, it is necessary to use available means of electronic communication, information and communication technologies (ICT), and geoinformation tools, as well as to revive the Athenian ideas of the (urban) agora that decides on the city through the process of social debate. Such an asymmetry is the basic rule for producing the majority of urban relations described in Latour's actor–network theory [3]. According to this theory, the basic element of all the networks is an actant, meaning a factor influencing all the other factors. In this article, the social sciences' idea of an actant will be synonymous with an agent. This term will apply both to a "human agent" that is a resident characterized by a vector of information specific to his/her age, education, place of residence and work, general health condition, base level of social activity, trust, and so on, as well as a spatial object (district, office, park, and so on) that influences the inhabitants. These factors influence each other, forming a system of actors and networks.

On the basis of this theory, the authors of the article developed a model of interaction between the sensors–actants (both residents and spatial objects that form the urban tissue), which enables the simulation of social interaction processes as well as, indirectly, participatory democracy and the process of social participation of smart cities' residents. The critical element of this process is stimulating the civic activity of the citizens by increasing the level of trust, both in people and institutions.

The mutual trust of residents, as well as the level of the citizens' trust in institutions (e.g., municipal and security authorities, planners, educational institutions, healthcare system, and so forth), is particularly important with regards to implementing the idea of a smart city. While technological

development is important to ensure the effectiveness of the process, the development of social interactions among the residents and their joint decision-making on the vision for the city's development is crucial. As stressed in [4], cities of tomorrow need to adopt a holistic model of sustainable urban development. A city is smart when public issues are solved using information and communications technology (ICT) and there is involvement of various types of stakeholders acting in partnership with the municipal authorities (see [5]). The implementation of a smart city is strongly related to the process of social (geo)participation—according to PAS 180: 2014 (Publicly Available Specification under the British Standards Institution), this process means an "effective integration of physical, digital and human systems in the built environment to deliver a sustainable, prosperous and inclusive future for its citizens." Also, the new urban agenda stresses the need to empower all individuals and communities and to promote and broaden inclusive platforms that allow full and meaningful participation in the decision-making and planning process (see [6]). A city can be considered a smart one when it, in parallel, invests in technology and human capital to actively promote sustainable economic development and high quality of life (e.g., it enables natural resources management through civic participation).

The contributions of the paper are (i) development of sociological concepts: Bruno Latour's actor–network theory [3], Edward T. Hall's social distances [7], Erving Goffman's social interactions [8], related to sensors–actants which enables the simulation of social interaction; (ii) implementation of the model using multi-agent methodology [9] in GAMA toolset environment; (iii) spatiotemporal and sociological development of concepts of two smart cities and their implementation in GAMA; and (iv) an illustration of the model's operation in typical situations occurring in cities. The use of these approaches made it possible to model the social activity of residents in a dynamic and non-linear way, as well as to conduct spatiotemporal analysis, and create geospatial data models.

The article consists of six sections. After a short introduction (Section 1) the authors describe related works regarding the analyzed issue and motivate the choice of the methods used (Section 2). Subsequently, the authors discuss the research methodology (Section 3). In this section, the authors discuss the actor–network theory (Section 3.1), which is the basis of our model, as well as the method of city modeling used in this article (Section 3.2), then go to the description of city modeling using agents (Section 3.3) and interactions in which these agents participate (Section 3.4). The next section describes in detail, validates, and calibrates the model city (Section 4). The authors use two scenarios for this purpose: Terra incognita (Section 4.1) and Old Factory revitalization (Section 4.2). This is followed by an analysis of the spatiotemporal model of the city of Warsaw (Section 5) and subsequent sections analyze three scenarios: Parade Square (Section 5.1), "Mordor" on Domaniewska Street (Section 5.2), and Miasteczko Wilanów (Section 5.3). The work ends with a discussion and conclusions (Section 6).

#### **2. Related Works**

The interdisciplinary nature of the research undertaken by the authors of this article requires referencing numerous concepts and methods derived from urban planning, spatial planning, sociology, spatial science, as well as mathematics or computer science.

There have been numerous attempts at developing appropriate tools using modern technologies, e.g., geospatial multi-agent system design and integration, agent-based systems, machine learning, data mining, augmented reality, virtual reality, or 3D models, to ensure effective participation of citizens in urban and territorial development decision-making with a game theoretical treatment (see [10,11]). In [12], authors predict that the widespread presence of smartphones will soon mean that citizens will be treated as a network of sensors that the city will use for continuous development. The use of the concept of a personal digital assistant to support a smart-city citizen, which is most often run on smartphones, is described in the paper [13]. Authors propose a software prototype of a personal digital assistant 2.0, which, based on soft computing methods and cognitive computing, improves calendar and mobility management in smart cities. On the other hand, there are many publications on the analysis of social behavior in urban environments by using multi-agent systems or agent-based

models. Malleson, in [14], emphasizes the need to combine big data and agent-based modeling tools to analyze a smart city. Karmakharm and Richmond, in [15], include a description of the pedestrian behavior simulation in the event of a threat in the public space. Meanwhile, Sandhu et al., in [16], present the implementation of a model, based on intelligent agents, for controlling streetlights in a smart city. In their study [17], Olszewski, Pałka, and Turek analyze the problem of traffic jams in the office district with regards to car-sharing. Their agent-based model simulates the socio-economic behavior of the employees of the so-called Mordor of Warsaw.

Geo-sensing enables context-aware analyses of physical and social phenomena. Moreover, context-aware analysis can potentially enable a more holistic understanding of spatio-temporal processes [18], where authors discuss the possibilities of integrating spatiotemporal contextual information with human and technical sensor information. Among different types of sensors used to collect such kinds of information, they mention in situ sensors, technical remote sensors, and human agents, discussed by Sagl, Resch, and Blaschke [19]. Resch, in [20], defines human agent data as human-generated measurements. He distinguishes the situation in which humans generates data (subjective observations) and humans that carry "ambient sensors" to measure external parameters. In the literature, there were also attempts made for interpreting data acquired by a "human agent", who uses an interactive location-based service (iLBS) (e.g., to sense cultural-historic facts in the landscape) (see [21]).

Cellular automata (CA) can be used to simulate urban dynamics and land-use changes effectively. Several authors performed simulations of urban development and land-use changes using GIS-based cellular automata (see [22–25]). Li et al., in [26], indicate that using parallel computation techniques can significantly improve the performance of the large-scale urban simulation. Agent-based models are applied to increase the intelligence and flexibility of planning support systems. Saarloos et al., in [27], developed a framework in which an agent organization consists of three types of agents: "interface agents" to improve the user–system interaction; "tool agents" to support the use and management of models; and "domain agents" to provide access to specialized knowledge.

Imottesjo and Kain, in [28], developed a prototype mobile augmented reality (MAR) tool, Urban CoBuilder. The application facilitates participative planning of urban space to increase bottom-up and multi-stakeholder inclusion. Yan Zhang, in [29], prototyped CityMatrix, which is an evidence-based urban decision support system, augmented by artificial intelligence (AI) techniques, including machine learning simulation predictions and optimization of search algorithms. Zhang investigated the strength of these technologies to augment the ability to make better urban decisions. Allen, Regenbrecht, and Abbott, in [30], investigated a smartphone-based augmented reality architecture as a tool for aiding public participation in urban planning by developing a prototype system, which showed 3D virtual representations of proposed architectural designs visualized on top of the existing real-world architecture. The authors investigated whether using a smartphone augmented reality system increases the willingness of the public to participate and the perceived participation in urban planning.

Jing and Hai-xing [31] built a support vector machine (SVM) model to predict the trends of coordinated development. The authors compared the method with an artificial neural network, decision tree, logistic regression, and naïve Bayesian classifier regarding the urban ecosystem coordinated development prediction for the Guanzhong urban agglomeration.

Ultsch, Kretschmer, and Behnisch, in [32], used techniques of machine learning and data mining to discover comprehensible and useful structures in the multivariate municipality data. As Behnisch and Ultsch in [33] indicate, "Urban Data Mining represents a methodological approach that discovers logical, mathematical and partly complex descriptions of urban patterns and regularities inside statistical data".

In the conducted research, multi-agent systems were adopted as a tool for modeling and simulation. It enabled the implementation of the assumptions behind the actor–network theory for modeling social processes in the urban space of a smart city. A multi-agent system (MAS or a "self-organized system") is a computerized system composed of multiple interacting intelligent agents (see [9,34]). Multi-agent systems can solve problems that are difficult or impossible for an individual agent or a monolithic system to solve. The primary assumption of MAS is communication amongst the agents and their

autonomy. The notion of agents, as currently used in urban simulation models, is a kind of automaton that mimics the behavior of urban agents in a predetermined way. Portugali, in [35], describes a CogCity (cognitive city) as an urban simulation model that explicitly incorporates in its structure the role of three cognitive processes that typify the behavior of human agents: Information compression, cognitive mapping, and categorization. Moreover, the model CogCity demonstrates the possibility and usefulness of agent-based and cellular automata urban simulation model, which combines top-down and bottom-up processes in one model. Projects from MIT's SENSEable City Lab foster the vision of the real-time city by providing 'a feedback loop between people, their actions, and the city'.

The issue of public opinion formation is the subject of studies conducted by Deffuant, Amblard, and Weisbuch in [36], and Hegselmann and Krause in [37]. These authors consider the issue of social opinion formation through consensus, polarization, and fragmentation. The article investigates various models for the dynamics of continuous opinions by analytical methods as well as by computer simulations. Consequently, the rapid development of advanced technologies (IoT, wearable computing, etc.) forces the process of connecting real-world objects like buildings, roads, household appliances, and human bodies to the Internet via sensors and microprocessor chips that record and transmit data such as sound waves, temperature, movement, and other variables. This supports the development of smart citizens (see [38]).

In [39], Jacobs points to correlation between the urban form and the urban performance, e.g., the quality of life, vibrancy, and safety. Yan Zhang, in [29], takes it a step further and shows the correlation between the urban form and multiple aspects the urban performance. The 17 defined indexes represent 17 aspects of the urban performance of a city district, grouped into four high-level indexes: Density, diversity, proximity, and energy.

Sociological theories have been a vital source of inspiration for the authors of this article. Source studies include the theory of social impact developed by Nowak–Latané, which describes the interaction among members of large groups and the stabilization of opinions in groups [40,41]. Nowak, Szamrej, and Latané in [42] argue that "[modeling] the change of attitudes in a population resulting from the interactive, reciprocal, and recursive operation Bibb Latané's theory of social impact, which specifies principles underlying how individuals are affected by their social environment". Also, the dramaturgical Goffman's theory [8] and Latour's actor–network theory (ANT) [3] have been of crucial importance for the conducted research.

According to Latour's actor–network theory (ANT), an actant, an advanced sensor, is the basic component of all networks; it is a factor influencing other factors. ANT is a theoretical and methodological approach to social theory where everything in the social and natural worlds exists in constantly shifting networks of relationship [3]. It posits that nothing exists outside those relationships. All the factors involved in a social situation are on the same level, and thus, there are no external social forces beyond what and how the network participants are interacting at present. Thus, objects, ideas, processes, and any other relevant factors are seen as just as necessary in creating social situations as humans. Latour distinguishes two types of ties between the actants: Active and passive. The result of an active one is not typical and depends on the mediation between the mediators, meaning that the result is uncertain and variable. In the case of ties between mediations, the situation is stable, and the translation proceeds in a predictable and predetermined manner.

The authors have also been inspired by the studies that take into account the opinion formation model with a "strong leader" (see [43,44]), meaning a leader who significantly influences the molding and modifying of the opinions and attitudes of the residents.

#### **3. Research Methodology**

The authors of this article see the relationship between the structure of a city, spatial order, the way residents live, and the level of their social activity. The issue of information asymmetry is of crucial importance for modeling these relationships (see [45]).

#### *3.1. Actor–Network Theory*

Depending on the context of the study, a given actant may be divided into a more complex actor–network order (e.g., a city may be analyzed as a system of relations between buildings, districts, authorities, residents, road infrastructure, and so on). The strength of the influence of individual actants—which is a result of various factors, such as the level of mutual trust or the identification with a given place or space—determines the strength of the relations. Furthermore, a number of such forces may impact a single actant at a given moment, each of the forces with suitable power, which is often asymmetrical in relation to others. This lack of symmetry, or an uneven distribution of forces and influences, manifests itself in almost every dimension of the polysemous creation that is a city. Starting from the right to determine, through the levels of capital (social, economic, cultural, and symbolic) of the city's individual users, to planning and urban solutions that may result in, among other things, ghettoization or spatial exclusion of specific groups of residents. However, one may assume that, firstly, an uneven flow of information between particular actants, often conditioned by the previously mentioned level of social trust, underlies each of the urban asymmetries. Secondly, individual asymmetries overlap and form relations with other asymmetry systems, resulting in the production of additional, now much more intricate, networks of mutual influences and interactions. The level of trust is crucial for the activity and social participation of the citizens.

The underlying assumption of the authors in relation to the concept of geospatial multi-agent system design is that modeling of social interactions is a non-linear generalized regression. It is, therefore, assumed that:


In their research, the authors investigate how the trust of residents change with time: Both the level of mutual trust and the trust in social institutions, which then stimulates the growth of social involvement and social (geo)participation. The level of trust and its changes depend on factors such as, among others, place of residence, type of work, time spent in public transport or public facilities, theatres, as well as the types of building development or the openness of space, and so on. Changing all of the parameters for a population of hundreds of thousands of people requires the use of parallel computing in multi-agent systems and numerical simulations covering millions of calculation epochs, which model decades of a city's functioning. The base level of the residents' identities, the intrinsic idea of deliberative democracy, the so-called strong leaders in the local community, as well as the specific genius loci of the city are all crucial in the process of changing the level of trust and involvement of residents.

#### *3.2. Modeling of the Smart City*

The crucial element of studying the development of a city and the way the urban network works using the (broadly defined) game theory is determining whether the knowledge of individual players (advanced sensors) is symmetrical or if there is an informational asymmetry. Multi-agent systems are one of the tools used for modeling the game theory and the theory of market mechanisms.

The models use elements of game theory, emergence, sociology, and multi-agent systems. The model assumes that agents, representing the residents of a city (sensors), move around the city and interact with each other. It takes place during every act of verbal or non-verbal communication. This process involves mutual decoding and simultaneous interpretation of the meaning of symbols used by the other party in communication. Interactions influence and change agent's trust in other residents. The information asymmetry phenomenon is easily modeled in a multi-agent system where every agent (an autonomous software element), while making decisions based on private information and interacting with the remaining agents, has a piece of information whose level may be varied. The simulation is divided into small time quanta (e.g., 15 min), during which the residents interact with each other.

The city is modeled by a system of roads on which the agents move, a set of buildings, including stand-alone and multi-family residential buildings, factories, office buildings, offices, health clinics, schools, and museums; green areas, i.e., boulevards, parks, and water reservoirs. The city is also divided into districts distinguished by a set of general features, which characterize both the district and the people in the district, e.g., the office district is characterized by a significant share of office buildings, and the people in it are blue-collar workers.

#### *3.3. Citizen Modeling—Agents*

An agent models a city resident and has a set of features that reflect its social character:


Besides the features above, each agent is assigned to a place in which s/he lives (a residential building) and a workplace (an office building or a factory). During the simulation, the agents are moving around the city according to the daily rhythm. Residents navigate the city along a network of roads; eventually, they can go to a demonstration.

Demonstrations take place in the so-called attractors—places that attract social interest and provoke extreme emotions. One such attractor is the Palace of Culture and Science in Warsaw, as its demolition is a continuing matter of dispute. A demonstration causes a clash of extreme emotions of the participants and often results in a change of stance regarding the fate of a given attractor.

The devised model assumes that from Monday to Friday, each agent (human agent) leaves for work in the morning (06:00 to 08:00). An agent stays at work for eight hours and then returns home. Some people go to a governmental agency during work hours (10:00 to 12:00) or to a doctor (09:00 to 10:00). After work, and on the weekends, some of the agents leave the city (18:00 to 23:00). Agents meet and interact when traveling, walking around the city, arriving at work, or places of entertainment.

In addition to ordinary residents, there are also the so-called leaders: Social activists who want to impact on the society so that, following the idea of Václav Havel's "The Power of the Powerless" essay [46], many of the weak gain influence over the molding of the urban fabric through mutual interactions. Leaders are close to controversial events. Positive leaders are those with extreme and affirmative trust in people; they have a beneficial influence over the social trust in others. Negative leaders place extreme and negative trust in people, thereby lowering the trust of people with whom they interact towards others.

#### *3.4. Agents' Interactions*

Meetings between residents take place when they are in the same buildings and when the agents are moving around the city. During these meetings, interactions occur, affecting the change of the agents' characteristics. Such interactions are a symptom of human spatial behavior, resulting from the social distances described by Hall in [7]. Generalized Tobler's first law serves as the starting point for modeling the processes of urban interactions in accordance with Latour's actor–network theory. It states that "everything is related to everything else, but near things are more related than distant things." In the modeled process, objects (actors) influencing each other are actants (sensors) understood as the residents of the city, buildings, spatial arrangement, the dominating function of a given city district, and so on. It is important to emphasize that the "nearness" of the actants here (in this context, the residents) can mean not only the distance in the geographical space, but also the similarity of characteristics, shared interests and views, or social media connection. There is also an assumption that interactions may occur through social media or online interactions, without the need for agents to meet physically. Thus, what is taken into account is not the physical distance between the agents (Euclidean distance in the space of a city), but the distance of the social network, which depends on the educational or age difference. The following interactions, two of them defined by Goffman in [8], latter proposed by the authors, are considered:


Additionally, there is an assumption that the mere presence of a citizen in a given district or a building affects her/his characteristics and, in particular, her/his trust in other people. Being in an office, healthcare facility, at a workplace, school, or in an industrial or office district reduces trust (temporarily or permanently). On the other hand, being in a park, museum, on the boulevards, or in a historic or recreational district increases trust (temporarily or permanently). These changes are called the location influence.

The characteristic analyzed by the authors is the change in the level of mutual trust between residents ("human agents") resulting from the multidimensional influence of the actants. The level of their "trust" tends to "equalize" with the interaction of agents representing individual residents, although it is not an immediate or rapid process. The conducted simulations aim to determine how the long-term change of people's trust (mutual and to institutions) over time affects the level of social participation and, indirectly, the development of open civil society and deliberative democracy.

The model includes three groups of actants (agents) interacting with each other in the city:


In the adopted model, the authors of the study take into account the various modifying functions (see Figure 1) of the trust parameter in the interactions between two agents:

• The "linear" function, which is the basis for the formulation of the others. The assumption is that this function, with the interaction of the agent *i* with the agent *j*, is as follows:

$$trust\_{People}^{i} trust\_{People}^{i} + c \cdot \Delta\_{trust\_{People}} \tag{1}$$

$$\frac{1}{2}ck\_{\text{cdu}} \cdot \left(1 - \text{cdu}^{i}\right) + k\_{\text{hap}} \cdot \left(1 - hap^{i}\right) + k\_{\text{new}} \cdot \left(1 - \text{vne}^{i}\right) + k\_{\text{gd}^{i}} \cdot \left(1 - age^{i}\right) + k \tag{2}$$

$$
\Delta\_{trust\_{Poxyle}} trust\_{Poxyle}^j - trust\_{Pople}^i \tag{3}
$$

where:

*kedu*—education parameter change modifier;

*khap*—satisfaction parameter change modifier;

*kwea*—wealth parameter change modifier;

*kage*—age parameter change modifier;

*k*—change modifier.

• The "reinforcing" function differs from the linear function in the manner of calculating parameter Δ*trustPeople* :

$$\Delta\_{trust\_{Popic}} \tan \left( trust\_{Pocople}^j - trust\_{Pocople}^i \right) . \tag{4}$$

• The "diminishing" function also differs from the linear function in the manner of calculating parameter Δ*trustPeople* :

$$\Delta\_{trust\_{Paple}} \text{tg} \{ trust\_{Paple}^j - trust\_{Paple}^i \}.\tag{5}$$


Moreover, it is assumed that the agent's trust that is included in the above dependencies is treated depending on the place where he resides. The trust of the agent staying at the place of residence is increased by 50%, while for the agent staying at the workplace, it is reduced by 20%. In addition, when the agent is in an entertainment location, his trust counts as 10% more. At the same time, the same trust for an agent staying at the governmental agency or health clinic is reduced by 50%. An agent residing in an industrial area (Old Factory or Mordor) also counts as 15% smaller, while when he is in the GreenLand or OldTown districts, it counts as 15% more.

The introduced methodology is used and validated in the following section.

**Figure 1.** Linear (y = x), reinforcing (y = tan(x)), and diminishing (y = tanh(x)) functions.

#### **4. Model City**

To validate the devised model of actant interaction, the authors developed a "model city" comprising seven hexagonal districts with a dominating type of function and structure (Table 1 and Figure 2). One million residents inhabit the city, as illustrated by the dot distribution map (Figure 3). Each of dots, representing 1000 inhabitants, was implemented into the system as an agent with specific characteristics such as age, education, wealth, marital status, number of children, identity (level of identification with the city), trust in people, and trust in institutions. The variations (standard deviation) of the agent characteristics in each district was assumed at 20% for model data. The parameters for agents were drawn according to the normal distribution.

The ArcGIS ESRI (ArcGIS is the name of software developed by Environmental Systems Research Institute; GIS is abbreviation of Geographic Information System) application was used to develop the source spatial database, enabling the preparation of a set of thematic layers (land cover, buildings, communication routes, districts borders, distribution of residents) as shapefiles. These layers were used to build a multi-agent system in the GAMA simulation platform (see [49,50]). GAMA is a modeling and simulation-development environment for building spatially explicit agent-based simulations (see [50]). It is a multiple-application domain platform using a high-level and intuitive agent-based language. With GAMA, users can undertake most of the activities related to modeling, visualizing, and exploring of the simulations using dedicated tools.

**Table 1.** Model city districts (mean values, in percent); standard deviation is equal to 20%.


**Figure 2.** Model city.

**Figure 3.** Dot distribution map (each dot represents 1000 inhabitants of the model city).

Thanks to the simulations carried out for the model city, it was possible to calibrate the multi-agent system properly, i.e., to determine the value of individual parameters, which then enabled the use of the devised model for a real urban agglomeration area. For example, 3650 iterations used by the authors correspond to a period of 10 years, during which the level of trust of residents changes significantly (and in a measurable way). The authors repeatedly modified the numerical values of particular factors (e.g., changing the level of trust of individual actants resulting from their mutual interactions) so that the parameterization of the model corresponds to the changes observed in the real cities. Because of the iterative calibration of the system, it was possible to determine the parameters of the model adequate for the research of real metropolises.

The analyses made it possible to check the spatial distribution of changes in the level of trust of the residents of particular districts (Figures 2 and 3) in a long-term (decades-long) process. Thanks to the use of a multi-agent system, it was possible to simulate many years of social processes in computational cycles lasting from several dozen minutes to several hours.

The study also examined the influence of some strong "positive" or "negative" leaders, the impact of the adopted function modifying the traits of the agents (linear, reinforcing, diminishing), as well as specific spatial problems in the model city. In the research, the authors adopted four analytical scenarios:


Each of the scenarios is associated with the engagement of a specific group of residents (e.g., the elderly, the less affluent, or the residents of a given part of the city).

Each agent is characterized by the "susceptibility" parameter, which determines the probability of an agent taking part in the social debate, demonstration, or protest. This characteristic is determined on the basis of the agent's other parameters, such as age, education, family, wealth, the level of identification with the city, and so forth. What is key, however, is a given agent's place of residence and the proximity (spatial or social) to the place where a problem, such as the revitalization of a district or the demolition of a controversial town hall, occurs. "Ordinary" agents engage (with a certain probability) in social life after work or on weekends. Only agents representing the "strong leaders" always remain in the conflict places. These agents do not change their level of involvement or trust during the interaction. For positive leaders, it is 1.0, while it is 0.0 for negative ones.

Making use of the devised multi-agent system and the GAMA toolset environment, 3650 computational epochs were carried out. The characteristics of individual agents changed in each iteration because of contacts with individual actants (residents and spatial objects). Using GIS tools, the resulting data were subjected to spatial aggregation analyzing the change in the average level of trust of agents residing in a given district of a model city. First, the authors of the article made calculations, the purpose of which was to check how the particular trust parameter modifying functions works (Figures 4 and 5).

**Figure 4.** The attractor activity and the level of mutual social trust in a model city through a simulated time of 10 years in the presence of 20 strong positive leaders.

**Figure 5.** Comparison of the fan-idol and fan-anti-idol relations on the level of mutual social trust in a model city through a simulated time of 10 years in the presence of 20 strong positive leaders.

In Figure 4, one can observe the comparison of the attractor's activity and the level of mutual social trust in a model city through a simulated time of 10 years in the presence of 20 strong positive leaders. All plots rise steadily, but with slightly different slopes. Moreover, in the period of attractors' activeness (days 1200–2400), the curves are moving away from each other. The results indicate that there are differences between individual attractors, which results from various geospatial settings.

In Figure 5, one can observe the comparison of the fan-idol and fan-anti-idol relations on the level of mutual social trust in a model city through a simulated time of 10 years in the presence of 20 strong positive leaders. What is interesting is that up to the 1600-ish day of simulation (four years and four months), the inactivity of the fan-idol and fan-anti-idol relations seems to be stronger. However, after that, the activity of the relations begins to be stronger, and at the end, after 10 years, the mutual trust is 0.4% stronger, compared to the inactivity of the relations.

To sum up, the analysis of various attractors and various functions modifying confidence shows us minor differences, which result from the differences characteristic for a given area, rather than differences in the algorithm used.

The variants of analytical simulations implemented are presented below.

#### *4.1. Terra Incognita (Unspecified Space)*

Research question: Will the district be built in a "closed" way (gated communities) or (because of the increased level of social (geo)participation and trust) in an "open" way? It is of particular interest to the residents from three marked districts, and young, relatively wealthy, and well-educated people.

Figure 6 shows the comparison between the number of strong leaders and the level of mutual social trust in a model city through a simulated time of 10 years, with fan-idol and fan-anti-idol relations inactive. In the absence of strong leaders, the slope of mutual trust decreases over time and the increase in 10 years is 1.12%. For 20 strong positive leaders, trust increases by 5.44% within the simulation, but the slope is changing; during the manifestation, the slope slightly decreases. It can be explained by the concentration of all the positive leaders in one district of the city. However, in the case of 20 strong negative leaders, the authors observe a slight increase in trust in the beginning, but on the 2000-ish day of simulation (around year 5.5), it begins to drop. Ultimately, after 10 years, trust increases by 1.12%, which is the result of accumulating negative opinions during protests. Similar dependencies, with accuracy to value, occur for other attractors. Finally, the existence of both 20 positive and 20 negative

strong leaders results in a different outcome than the non-existence of strong leaders. On the contrary, one can note that the positive leaders have more impact on mutual trust than the negative ones. For this case, the mutual trust after 10 years of simulation increases by 2.94%.

**Figure 6.** Comparison of the number of strong leaders and the level of mutual social trust in a model city through a simulated time of 10 years, with fan-idol and fan-anti-idol relations inactive.

In Figures 7 and 8, one can see that the changes in the level of trust differ significantly in individual districts of the model city. These changes also have different intensity over time. This process depends not only on the number of strong leaders, but also on the characteristics of residents of particular districts and the level of their involvement in the problem of this attractor. Positive leaders influence a slight increase in trust in Bedroom, while negative leaders considerably reduce the level of trust of the residents of this district. For those who are not very interested in the spatial development of the new district (residents of Old Town and GreenLand), the level of trust decreases both in the presence of positive and negative leaders, although with varying intensity.

**Figure 7.** Changes in the level of mutual social trust in a model city through a simulated time of 10 years (20 negative leaders in the "Terra incognita" attractor); iteration 0, 1200, 2400, 3650.

**Figure 8.** Changes in the level of mutual social trust in a model city through a simulated time of 10 years (20 positive leaders in the "Terra incognita" attractor); iteration 0, 1200, 2400, 3650.

#### *4.2. Old Factory Revitalization*

The revitalization of the building after the Steelworks closes down leads to gentrification—and loft spaces for the wealthy. Conducted studies on spatial and descriptive data characterizing the model city have shown that for the relatively poor employees of the closed down Steelworks and the residents of this region (this district and a spatially close part of the bedroom community, center, and business district), this problem is particularly significant (affects negatively). Figure 9 shows a zone of strong and weak impact of the "Old Factory" attractor.

**Figure 9.** Zone of strong and weak impact of the "Old Factory" attractor.

In Figure 10, one can observe that overall increase in trust by 8% does not mean an even increase of confidence in all districts of the model city. The revitalization of the factory and the creation of residential lofts is especially important for the residents of this district and two neighboring ones (business district and city center). This is also important for some residents of the city who work (or have worked) in a liquidated factory, regardless of where they live. For the inhabitants of the bedroom suburbs, the process of revitalization is of little importance to the wealthy inhabitants of the old town and green land who are not interested in this subject, it even results in lowering the level of social activity. On account of the obtained results, it is possible to state the following:


**Figure 10.** Changes in the level of mutual social trust in a model city through a simulated time of 10 years (20 positive leaders in the "Old Factory" attractor); iteration 0, 1200, 2400, 3650.

#### **5. Spatiotemporal Modeling of the Warsaw Area, Poland**

With an appropriately calibrated model and its implementation in the form of a multi-agent system, the authors attempted to conduct research and simulation on real data. The agglomeration of Warsaw in Poland, with its 1,754,000 inhabitants, was the test object (Figure 11). Spatial data used in the study come from the general geographic database, which contains data at the level of accuracy that is equivalent to analogue maps with a scale of 1:250,000. This study was up-to-date in 2016. As with the model city, the distribution of residents was modeled in the form of a dot distribution map (Figure 12), where each of 1754 dots (agents) represents 1000 inhabitants. The data contain a set of characteristics defining the demographic, social, and cultural features of individual residents (Table 2).

The research [51,52] is examining the level of trust of Poles, both in public institutions and each other. It indicates the direct relationship between trust, civic activity, and the level of education. It turns out that "trust increases civic activity only after reaching or exceeding the threshold of secondary education.". In addition, in 2015, at the request of the City Hall of Warsaw, a study was conducted on the quality of life of the residents of Warsaw districts, in which a set of questions was devoted to trust. Also, although the level of trust in friends and family remains at a level close to 90%, confidence in politicians (17%), journalists (34%), and local authorities in Warsaw (35%) still remains very low.

Analytical experiments were carried out for Warsaw by simulating long-term social processes for three selected test areas (Figure 12):




**Figure 11.** Warsaw (PL) and its districts.

**Figure 12.** Three attractors in Warsaw and dot distribution map.

#### *5.1. Parade Square (Plac Defilad)*

Together with the Palace of Culture and Science, Parade Square has been a source of controversy for years. The background of this conflict is primarily generational and, thus, historical. The problem lies not only in its development, but also in the possible ways of integrating this space into the city.

The seniors born or living in Warsaw declare low trust and distaste for this area. To this group, this place is connected with the period of communist domination, complicated history, and the Soviet Union. They would like to demolish this space along with the Palace of Culture and Science. This group of people has a very high level of identification with the city and its space, especially with the city center. For the youth and people in their prime (up to around 35 years of age), the Palace remains a symbol of Warsaw and the location of many cultural activities, as well as a place for meetings or dating. Young people, in particular, have been trying to revitalize this area for years. They would like to combine the Palace of Culture and Science with space for everyone, where there is a place for greenery, leisure, and a body of water. A manifestation of this is, for example, the so-called Central Park (https://parkcentralny.pl/), which is a project aimed at creating the Green Heart of Warsaw.

In Figure 13, the change of average mutual trust of residents for Parade Square with 20 negative and 20 positive strong leaders can be observed. The mutual trust decreases by 6.8%, which is the opposite of the case of the Model City, where mutual trust grew in a similar situation (20 positive and 20 negative strong leaders). This process is caused by different demographic characteristics of Warsaw (a real urban agglomeration) and the model city.

**Figure 13.** Change in the average mutual trust of residents for Parade Square with 20 negative and 20 positive strong leaders.

#### *5.2. "Mordor" on Domaniewska Street*

In the 1990s, the process of changing this part of the city from one of industrial function to one of a service function began. At the moment, Domaniewska Street is one of the main streets of the largest office complex in Warsaw. The residents of the city have jokingly named it Mordor, which is the dark land from the novels by J.R.R. Tolkien, because of its traffic-related problems.

The problem with this part of the city is the extreme intensification of office space, lack of greenery, and a limited number of parking spaces. A thick line seems to separate the Mordor of Warsaw from the city (for communicational and architectural reasons, but also mentally). This part of Warsaw has always been associated with a low level of identification with the city. After its change from an industrial area to a service one, the problem of low-level identification and trust remained (which also applies to the attitude of officials and corporations). It is a place where one has to be (work), not where one wants to be. Places that appear in Mordor (restaurants, cafés) are there primarily to serve the corporations; there are no cultural or recreational places (except for fitness centers), and so on.

In this part of Warsaw, all actants have a very low level of trust in public transport and, therefore, in officials as well (employees as well as visitors or residents). For people from the outside, Mordor is a place of very low trust and identification (especially for those who live close to this area, e.g., Mokotów, Ursynów, and so on, neighboring districts). People living in the outlying districts are somewhat indifferent to this place. On the other hand, this place "draws" newcomers who decide to buy flats there. As a result, a sense of new urban identity arises, in a way forged as an opposition to the inhabitants of other parts of Warsaw.

The critical problem of this part of Warsaw is the extreme congestion of streets during rush hour. Almost all of the 100,000 employees of corporations in Mordor commute to work using the company car. An increased mutual trust would allow the rationalization of commuting, e.g., by using carpooling (see [17]).

In Figure 14, one can observe the change of average mutual trust of residents for Mordor with 20 strong negative leaders. The constant drop of mutual trust (−16.06%) can be observed (the drop is more than twofold when compared to Parade Square, see Figure 13). It means that for the city of Warsaw, in the case of strong negative leaders, the level of mutual trust is threatened by a significant reduction. The increase in social trust, and, thus, also the increase in social participation visibility, e.g., through the joint use of company cars leading to increased traffic capacity, therefore requires a strong inspiration of "positive leaders" of social changes in this office area.

**Figure 14.** Change in the average mutual trust of residents for Mordor with 20 strong negative leaders.

#### *5.3. Miasteczko Wilanów*

This is a recently built district of the city with gated communities inhabited mainly by newcomers with a low sense of identification with Warsaw. The inhabitants of the "old" Wilanów and other Warsaw districts (apart from Białoł ˛eka) do not trust the residents of the "new" Wilanów. The relations within this district are also complicated: Those who live in the prestigious buildings built in the very beginning do not trust those who came to live in buildings of a much lower standard and intended for the middle class (a typical class conflict).

What characterizes this place is the lack of kindergartens, schools, sports fields, and so on, as well as the underdeveloped public transport. Also, there is no broadly understood public space or any green areas.

An increased mutual trust and level of identification would not only facilitate the active social participation of residents, but also improve relations between the residents of various parts of Wilanów (the "old" and the "new) and the other districts of the city.

In Figure 15, one can observe the change of average mutual trust of residents for Wilanów with 20 strong positive leaders. In this case, the mutual trust increases steadily up to the level of 73.48% (an increase of 8.94%). What is interesting, for an analogous number of leaders, the drop for Mordor (see Figure 14) is nearly twice the one for Wilanów, which means that Warsaw is a city endangered by the declining mutual trust, and it is more difficult to increase the trust than to decrease it.

**Figure 15.** Change in the average mutual trust of residents for Wilanów with 20 strong positive leaders.

#### **6. Conclusions and Future Work**

After completing the simulations and accessing the results, it seems that, thanks to the developed concept and the prototype of a multi-agent information system, which uses spatial data, demographic information, and sociological, mathematical, and urban theories, performing complex geospatial big data analyses, geospatial information extraction, and data mining is possible. Because of the idea of "citizens as sensors" represented by "human agents", game theory, information asymmetry, urban morphology, and multi-agent systems, it was possible to model changes in the residents' activity over decades, even in the cases of agglomerations of hundreds of thousands of residents. Therefore, it is a useful tool not only for conducting urban big data analyses in urban studies, spatial science, or applied social science, but also for shaping the smart cities of the future. The analysis of the results for the model city and Warsaw shows that each city has a "potential for mutual trust" that emerges from the distribution of its buildings, road network and, of course, its inhabitants. This potential of social trust can be substantial, causing an increase in mutual trust, as in the case of the model city. It may also be small, as in the case of Warsaw, where it is difficult to increase mutual trust. However, thanks to strong leaders, it is possible to shape the trust and support the process of increasing the activity of residents—active sensors and their social participation in creating a smart city.

Therefore, the following may be stated:


at a given time. Research on real data enables following the initial situation and then applying the results from the model city to, for example, Warsaw. Consequently, one can determine the factors that directly or indirectly influence the increase of participative activities and then strengthen those elements that require reinforcement.


It is also worth emphasizing that the developed model can be (after minor modifications) also used in other applications, e.g., stimulating the residents' activity to install photovoltaic panels [53], real estate or multilateral negotiations for building plots in distributed multi-agent environment [54].

As future work, it is worth considering to supplement the proposed multi-agent, agent-based model with a game theoretical treatment, in particular to identify possible social dilemmas, such as e.g., public goods, tragedy of the commons and trust dilemma, and their potential impact on the development of public trust in the considered urban agglomeration of groups of interacting individuals with different interests. The authors of the article also plan to expand the model with elements of gamification between residents to model different ways of social activity of city residents; specifically, it is planned to model the social gamification in a smart city, which is likely to stimulate the installation of the photovoltaic panels. A smart city, understood not as intelligent city infrastructure but as a smart, open geoinformation society, is shaped by the "power of the powerless," which can be reinforced.

The developed model is universal; it can be easily parameterized on the basis of any input data, e.g., social, sociological, or economic. The model will be verified and tested in other agglomerations and different cities. These will include cities characterized by a higher baseline level of trust (Scandinavian cities) and culturally different areas (Singapur, Masdar City).

**Data Availability:** The code of the project in gaml language with included data used to support the findings of this study have been deposited in the git repository: https://gitlab.com/PiotrPowerPalka/smartcitygrowthgama.git.

**Author Contributions:** Conceptualization, R.O. and P.P.; methodology, P.P, R.O., A.T., and B.K.; software, M.B. and P.P.; validation, A.T and T.P.; formal analysis, R.O. and T.P.; investigation, R.O, P.P., and A.T; resources, A.T.; data curation, P.P and B.K.; writing—original draft preparation, R.O, P.P., A.T., and B.K.; writing—review and editing, A.T.; visualization, R.O, A.T, and P.P.; supervision, R.O.; project administration, A.T.

**Funding:** This work was supported by the FabSpace 2.0 project funding from the EU 's Horizon 2020 research and innovation program under the grant agreement No. 693210.

**Conflicts of Interest:** The authors declare that there is no conflict of interest regarding the publication of this paper.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Scale-Free Features in Collective Robot Foraging**

#### **Ilja Rausch \*, Yara Khaluf and Pieter Simoens**

IDLab, Ghent-University, Technologiepark 126, B-9052 Ghent, Belgium

**\*** Correspondence: ilja.rausch@ugent.be; Tel.: +32-9331-49-75

Received: 19 April 2019; Accepted: 27 June 2019; Published: 30 June 2019

**Abstract:** In many complex systems observed in nature, properties such as scalability, adaptivity, or rapid information exchange are often accompanied by the presence of features that are scale-free, i.e., that have no characteristic scale. Following this observation, we investigate the existence of scale-free features in artificial collective systems using simulated robot swarms. We implement a large-scale swarm performing the complex task of collective foraging, and demonstrate that several space and time features of the simulated swarm—such as number of communication links or time spent in resting state—spontaneously approach the scale-free property with moderate to strong statistical plausibility. Furthermore, we report strong correlations between the latter observation and swarm performance in terms of the number of retrieved items.

**Keywords:** agent-based collective intelligence; multi-agent complex systems; scale-free properties; power law distribution; biologically inspired approaches and methods; collective foraging; physics-based simulation; methodologies for agent-based systems; multi-robot simulation

#### **1. Introduction**

Advances in computation have made it possible to record, simulate, and analyze multi-agent complex systems in nature, such as fish schools, bird flocks, locust swarms, and ant colonies. In many of these collective systems, various attributes were found to be *scale-free* [1], i.e., the attributes do not have a characteristic size or value. Examples of such scale-free features found in biological systems include, among others, (i) asymptotically scale-free correlation lengths of starling flocks [2,3]—the term *asymptotic* refers to the behavior of a variable (in this case spatial correlation) close to a limit (in this case an infinite flock size); (ii) scale-free fluctuations of velocity and orientation correlations in moving bacterial colonies [4]; (iii) time intervals between communication calls that follow a power law—which is the mathematical representation of a scale-free property—in pairs of zebra finches; and (iv) scale-free movement patterns found in models of foraging primates [5] or midge swarms [6].

One of the most prominent findings is that the number of interactions appear to be scale-free in various real-world *networks* of biological and social systems [7–9]. Multi-agent systems benefit from scale-free communication because it enables scalable, fast and efficient information transfer [10–12]. An essential aspect of scale-free networks is that they represent complex topologies in which only a few nodes (called hubs) have a comparably high connectivity degree [7]. This small percentage of highly connected hubs makes scale-free topologies vulnerable to targeted attacks but exceptionally robust to random failures (which are likely to affect the vast majority of nodes that are not hubs) [13]. Furthermore, due to the high connectivity, the network diameter is small, which means that on average, any two nodes can share their information only over a few hops [11], resulting in fast information transfer.

Inspired by the high prevalence of scale-free features in (socio-) biological systems, the aim of the current study is to examine whether scale-free attributes may also spontaneously emerge in *artificial*

collective systems. One particularly prominent example of these systems inspired by nature is *swarm robotics*, where robot collaboration is an essential prerequisite for the successful execution of tasks [14–20]. Although accurate understanding and systematic design of swarm robotic systems are considered to be among the greatest challenges of contemporary robotics [21], swarm robotics benefits strongly from the progress made in wireless communication technologies, system integration, machine learning and artificial intelligence (AI). Consequently, artificial cooperative multi-agent systems gain in importance not only in applied science and engineering but also in fundamental research, allowing the shrinkage of the reality gap of detailed modeling and the accurate simulation of distributed biological systems.

Many natural collective behaviors were modeled using artificial collective systems, including bird flocking [22], locust marching [23] or cockroach aggregation [24]. However, among the most prominent challenges is the *collective foraging* task [18]. Extensive efforts were dedicated to the study of the foraging task because of its remarkable prevalence. The foraging behavior is found in various species and subspecies across the world [25,26]. In most cases, its efficient implementation is essential to group survival.

In essence, a multi-agent system performing the foraging task has the goal of collectively retrieving information and other resources from the environment. Different to a single-agent implementation, the benefit of a multi-agent system is its ability to share the accessed resources and information to enhance the overall performance. However, the multi-agent foraging task exhibits a considerable degree of complexity, making its modeling and analysis very demanding [18]. Successful collective foraging often requires a delicate combination of several extensively studied multi-agent sub-behaviors such as deployment [27,28], exploration [29,30], aggregation [31,32] or information sharing [33,34]. Hence, even though collective foraging itself can be considered to be a specific task within a large class of multi-agent problems, it rightfully receives separate attention in numerous contemporary studies [16,35–38].

Moreover, collective foraging is a promising behavior for many real-world applications such as exploration by aerial vehicles [39], underwater monitoring [40], or optimization of electrical networks [41]. Therefore, the foraging performance of artificial multi-agent systems, potentially in combination with other types of AI, is worthwhile investigating in depth. In particular, in robot swarms, various fundamental questions have already been addressed such as the influence of interference [42,43], regulation of information flow [33] or achievement of consensus [44]. Nevertheless, other relevant questions are still open to research, including how does the distribution of individual features change in relation to the input from the environment or social interactions? Is there a connection between particular feature distributions and the performance of the swarm? To address these questions with respect to scale-free properties, we simulate the foraging behavior in a robot swarm and analyze the emergence of scale-free features. For this purpose, the complexity of the foraging task is advantageous as it offers a wide range of features that can be examined for their statistical tendency to be scale-free.

Our goal can be split into the following: (i) investigating the existence of scale-free features in a robot swarm performing the foraging task, (ii) studying the correlation between these features and the swarm performance, (iii) discussing the potential role of feedback mechanisms in the emergence of such scale-free features.

We begin with defining the robot (microscopic) and the swarm (macroscopic) behaviors in Sections 2.1 and 2.2, respectively. The link between these two levels of behaviors is formulated using statistical distributions and elaborated on in Section 2.4. In Section 2.5, we describe the experimental setup. Thereafter, in Section 3 we demonstrate the occurrence of scale-invariant features—such as those related to the communication degree or times spent in foraging or resting—and their correlation with swarm performance. Furthermore, we discuss the present feedback mechanisms that may support the emergence of scale-free features in a sophisticated set of scenario configurations. Lastly, the paper is concluded in Section 4.

#### **2. Methods**

#### *2.1. Robot Behavior*

We focus on the robot's decision-making process that is defined by the robot's interactions and the robot's individual preferences. The robots are situated in an arena which consists of a nest and a foraging area. Each robot can switch between two states: *resting* and *exploring*. In biology, similar behavior called *forager activation* has been observed in harvester ants, *Pogonomyrmex barbatus*, and reported in several publications [45–47]. In the exploring state, the robot moves around searching for items—which are located only in the foraging area—to retrieve to the nest. The robot moves on a straight line until it encounters another robot or a wall, in which case a *collision avoidance* maneuver is initiated. In the resting state, the robot rests inside the nest and only in this state it is allowed to communicate with the neighbors within its line of sight. Specifically, each robot can broadcast a message about the success or failure of its latest exploration attempt or listen to its neighbors. Received information may either increase or decrease the robot's *probability* to switch from *resting* to *exploring* or vice versa. Probabilities are updated continuously using fixed probability jumps—we refer to those by the term *cues*, as in [14]. In the following, we introduce the different probability cues used in implementing the foraging behavior, in addition to the probabilities determining the switch between the two robot states. We also consider two distinct communication modes defining the duration of information exchange.

#### 2.1.1. State Switching Probabilities

Following [14], there is a minimum duration *θ* for the robot to stay in a certain state. The purpose of having such a threshold is to ensure that robots can perform the sub-tasks associated with this state for a certain amount of time so that necessary dynamics can take place. For instance, a minimum exploring time *θ<sup>e</sup>* needs to be at least as long as it takes for a robot to reach the most remote items (taking into account the constant linear speed of that robot) [16].

With this in mind, let us formulate the individual response to social and environmental cues in terms of *switching probabilities*. We denote {*ie*, *ir*} ∈ R+ <sup>0</sup> , <sup>R</sup><sup>+</sup> 0 and {*se*,*sr*} ∈ R+ <sup>0</sup> , <sup>R</sup><sup>+</sup> 0 as the robot's internal (*i*) and social (*s*) cues to switch to exploring (*e*) or resting (*r*) state, respectively. The probability of a robot to switch from the resting state to the exploring state is denoted by *pr*→*e*, whereas the probability to switch from the exploring state to the resting state is denoted by *pe*→*r*.

The probabilities are updated iteratively at every simulation time step in a discrete manner as in the following:

$$p\_{r \to \mathfrak{e}}(t+1) = p\_{r \to \mathfrak{e}}(t) + \delta\_{\eta}(t)\mathbf{s}\_{\mathfrak{e}} + \delta\_{\Phi}(t)\mathbf{i}\_{\mathfrak{e}} \tag{1}$$

$$p\_{\varepsilon \to r}(t+1) = p\_{\varepsilon \to r}(t) - \delta\_{\eta}(t)\mathbf{s}\_r - \delta\_{\phi}(t)\mathbf{i}\_r \tag{2}$$

where *δη*(*t*) is the number of 'success' minus 'failure' messages received by the robot from its neighbors at every time step spent in the *resting* state. Additionally, the robot's own experience is characterized using *δφ*(*t*). This is defined for every exploration attempt as follows:

$$\delta\_{\varPhi}(t) = \begin{cases} +1, & \text{at the time instance of finding an item if } t\_{s\epsilon} < t \le t\_{s\epsilon} + \theta\_{\varepsilon} \\ 0, & \text{at all other time instances if } t\_{s\epsilon} < t \le t\_{s\epsilon} + \theta\_{\varepsilon} \\ -1, & \text{if } t > t\_{s\epsilon} + \theta\_{\varepsilon} \text{ and the robot is still exploring} \end{cases} \tag{3}$$

with *tse* as the time at which the robot started its current exploration. Moreover, *δφ*(*t*) = 0 while the robot is resting. In case *pe*→*<sup>r</sup>* < 0 it is truncated to *pe*→*<sup>r</sup>* = 0 and when *pe*→*<sup>r</sup>* > 1 it is truncated to *pe*→*<sup>r</sup>* = 1; same holds for *pr*→*e*. Please note that there is a strict difference between *δη*(*t*) and *δφ*(*t*): *δη*(*t*) may be non-zero only when the robot is resting inside the nest because it is computed based on the information broadcast by the neighbors which can only be received in the nest (i.e., when the robot is in the resting state). Whereas, *δφ*(*t*) may be non-zero only when the robot is exploring—it is computed based on the robot's own foraging experience. Table 1 lists the parameters relevant for the computation of the switching probabilities.


**Table 1.** An overview of parameters defining the switching probabilities.

#### 2.1.2. Communication Modes

We focus on the local communication between the robots and their influence on the global swarm behavior. A common approach is to restrict robot communication *only* to the area within the nest. This approach is inspired by several natural systems, in which the communication takes place mainly inside the nest or hive, such as in the case of ants or honey bees [14,47–50]. Moreover, this approach accommodates two relevant properties of foraging systems: (i) it is common that the foraging area is significantly larger than the nest area, and hence, individual encountering rates outside the nest are negligibly low; (ii) high density of individuals within the nest leads to more accurate information about the environment due to the high encounter rate of individuals that explored different, distant parts of the foraging area.

Regarding particular communication strategies, it is common to let robots broadcast the last exploration result only *once*, namely when the robots switch to the *resting* state. Henceforth, we will refer to this approach as the *discontinuous communication mode* (DCM), because after broadcasting the message once, the active communication of the robot is interrupted and is limited to listening. In contrast, we use the term *continuous communication mode* (CCM) to refer to the mode in which robots continue broadcasting the result of their last foraging attempt at every time step until they switch back to the *exploring* state. As we will see later, the difference between these two modes does not have substantial impact on swarm performance. However, it has a significant impact on the statistical distribution of various system features for which we study the scale-free property.

#### *2.2. Swarm Behavior*

At the macroscopic level, global behavior emerges as a result of complex interactions between the robots as well as between robots and their environment. The quality of such global behavior is evaluated with respect to quantifiable objectives. In the present study, we define the swarm performance in terms of three quantitative measures:


3. the average number of collected items per time spent exploring *ω<sup>e</sup>* = *Ncoll Te* , where *Te* is the aggregate time that all individuals spent in the exploring state.

#### *2.3. Measured Features*

There is a large variety of features that could potentially have scale-free character in collective foraging. We investigate such features categorizing them into, *space* and *time* features. An overview of the measured features is given in Table 2. Space features are mostly related to the inter-robot communication according to their distribution in the arena. This is a distribution that changes over time while the robots are in motion. Among the space features, the robot's communication degree *d* is the most important and evident. It is defined as the number of communication links to neighbors within the robot's communication range. However, in dynamic topologies—where robots move around and neighbor lists are constantly updated—the communication degree *d* changes frequently. Hence, we track additionally the change of the communication degree Δ*d* of a robot whenever fellow robots enter or leave its communication range. Beside the communication degree, we analyze space features that reflect the foraging progress such as the difference between the number of *received success* and *failure* messages, denoted by the *critical degree dc*,*rec*. Similarly, we include features that reflect the success-degree of a particular individual by measuring the difference of *success* to *failure* messages *sent* by that individual, denoted by *dc*,*sent*.


**Table 2.** An overview of the investigated space and time features.

With respect to *time* features, we note that in swarm robotics the individuals are commonly subject to physical interference. Robots interfere with each other or with obstacles as a result of finite-size effects influencing the dynamics of the collective behavior [42,43,51]. Therefore, we investigate the time spent on collision avoidance, denoted by *τca*. Additionally, we study time features that are related to the robot's exploring time *τe*. This time can be split into foraging time *τ<sup>f</sup>* , i.e., the time spent on searching for items, and homing time *τh*, i.e., the time spent on returning to the nest. While a long foraging time effectively increases the probability of finding items, long homing times indicate overcrowding close to the nest. Finally, another relevant time feature is the resting time *τ<sup>r</sup>* that includes the duration of robot interaction within the nest.

#### *2.4. Data Analysis*

In complex systems such as swarm robotics, the statistical analysis of relevant system properties paves the way to mathematical modeling, useful simplifications, or inference of long-term behaviors. Consequently, it helps in defining the link between the individual robot behavior and the emergent global swarm behavior, referred to as the micro-macro link [52]. In our study, we focus on how the collective foraging behavior can be related to the scale-freeness of a set of individual and global features. The main

statistical characteristic of scale-free features is that they are distributed according to a power law [1,53]. Therefore, to identify scale-free features in our simulated swarms, it is of central importance to measure the statistical distribution of these features and to perform a sound power law fitting procedure.

#### 2.4.1. Power Law Fitting Procedures

To verify whether a feature is scale-free, we use a set of techniques that are described in [53–55] for fitting its distribution by the *power law distribution*. The power law distribution takes the form of a straight line on a log-log scale of *p*(*x*). However, most real-world data displays significant fluctuations due to randomness. When fitting power law to the data, random fluctuations are considered by the statistical value *p* which represents the *goodness-of-fit*. When *p* < 0.1 the power law fit can be considered to be unreliable [54]. Furthermore, power law behavior emerges mostly only in the tail of the distribution, i.e., for higher values of *x* above a statistically determined lower bound *xmin* [53]. Please note that this effectively reduces data set to fit by the power law, which is important to keep in mind by considering a ratio of the total number of data points to the points that satisfy the condition *x* > *xmin*. Finally, there are several other statistical distributions that may resemble the characteristic straight-line tendency of a power law on a log-log plot. Hence, for a sound statistical analysis it is important to compare the power law fit to other statistical models [54–56]. More precisely, the power law fitting procedure can be summarized by the following three steps:

1. Using maximum likelihood estimation, fit the data by the power law distribution

$$p\left(\mathbf{x}, \mathbf{x}\_{min}, \mathbf{a}\right) = \frac{a-1}{\mathbf{x}\_{min}} \left(\frac{\mathbf{x}}{\mathbf{x}\_{min}}\right)^{-a},\tag{4}$$

where *α* is the scaling parameter and *xmin* is the lower bound. In particular, *α* and *xmin* are estimated using procedures described in [54].


#### 2.4.2. Quality Ratio *ρ<sup>q</sup>*

Given a high quantity of empiric data sets, it is useful to find an automated way for the evaluation of the power law fits. For the analysis of our experiments, we introduce a quality ratio *ρ<sup>q</sup>* which we use as a practical estimate of the plausibility of a (truncated) power law fit based on the well-known rigorous statistical tests described above. The quality ratio *ρ<sup>q</sup>* includes the three criteria discussed in Section 2.4.1:

*p*-value, *Ndata*,*pl* and the number of likelihood-ratio tests resulting in *R* > 0. We account for these criteria by defining *ρ<sup>q</sup>* as the product of *ρq*,*p*, *ρq*,*data* and *ρq*,*lrt*:

1. First, we begin with the *p*-value. As mentioned above, the linear shape of the data distribution on a log-log plot can be mainly attributed to random fluctuations if *p* < 0.1. Taking this into account, we design *ρ<sup>q</sup>* to be a binary piecewise function evaluating the *goodness-of-fit* in terms of the *p*-value:

$$\rho\_{q,p} = \begin{cases} 1.0, & \text{if } p \le 0.1 \\ 0.0, & \text{otherwise.} \end{cases} \tag{5}$$

This way, we take into account the possibility that random fluctuations may be present but as soon as *p* > 0.1 we do not assign the precise value of *p* to the ranking of the fit. The reason is that random fluctuations might be present even if the data is in fact power law distributed. In that case the *p*-value could be very low even if the data is in fact power law distributed. In general, it might be more substantial to consider the size of the fitted data set and to compare the power law fit to other important distributions [54,56].

2. Second, *ρq*,*data*, denotes the ratio of the data which is fit by the (truncated) power law *Ndata*,*pl* to the total number of data points *Ndata*,*tot*.

$$\rho\_{q,data} = \frac{\mathcal{N}\_{data,pl}}{\mathcal{N}\_{data,tot}} \, ^{\prime} \tag{6}$$

3. Third, *ρq*,*lrt* represents the fraction of likelihood-ratio-tests in which the (truncated) power law fit proved to be statistically more plausible than other distributions. To include the quality of the (truncated) power law fit as compared to other distributions, we count how many times *nlrt*,*pl* we obtained *R* > 0 from the likelihood-ratio tests. We compare the power law fit to *six* distributions: *truncated power law*, *exponential*, *stretched exponential*, *lognormal*, *positive lognormal* and *normal*; all of them are implemented in [56] (except the normal distribution). Hence, we use the piecewise function,

$$\rho\_{q,lrt} = \begin{cases} 0, & \text{if at least one likelihood-ratio test yields } \mathcal{R} < 0\\ \frac{n\_{lrt,ql} + 1}{l}, & \text{otherwise.} \end{cases} \tag{7}$$

where we added 1 to *nlrt*,*pl* to account for the possibility that the likelihood-ratio test yields *R* ≈ 0, in which case the support for the power law fit is neither strengthened nor weakened.

Please note that in Equation (7) we set *ρq*,*lrt* = 0 if at least one distribution is a more reliable model than the power law. However, it is important to remember that our simulated systems are meant to include real-world attributes (e.g., finite-size effects, physical interference, line-of-sight interruptions during communication) and therefore deviate from ideal systems. Consequently, the assumption of power law (i.e., scale-free) distribution might be distorted and needs to be corrected. The deviation is often particularly distinct in the heavy tail. Therefore, one common correction technique is to consider the power law distribution with an exponential cutoff (also known as *truncated power law*) [57]:

$$p\left(\mathbf{x}\right) = \frac{\lambda\left(\lambda\mathbf{x}\right)^{-\alpha}}{\Gamma\left(1-\alpha, \mathbf{x}\_{\text{min}}\lambda\right)} e^{-\lambda\mathbf{x}}\,\mathrm{},\tag{8}$$

where *λ* is the scaling parameter of the exponential decay and Γ(*y*, *z*) is the upper incomplete gamma function. While Equation (4) directly implies that the feature is scale-free, Equation (8) describes an asymptotic scale-freeness in the limit *λx* → 0. This equation approaches the power law distribution asymptotically for *λx* → 0 and the exponential distribution for *xλ* 1, respectively. Thus, accepting that our systems are significantly constrained within physical boundaries we can slightly soften the criteria given by Equation (7) in the following way:

If the truncated power law passes more likelihood-ratio tests than the power law fit, i.e., if *nlrt*,*tpl* > *nlrt*,*pl*, we consider the success-ratio of the former. In short:

$$\rho\_{q,lrt} = \begin{cases} 0, & \text{if at least one likelihood-ratio test yields } R < 0\\ \frac{n\_{lrt, plr} + 1}{\mathcal{I}}, & \text{if } n\_{lrt, plr} > n\_{lrt, plr} \\ \frac{n\_{lrt, plr} + 1}{\mathcal{I}}, & \text{otherwise.} \end{cases} \tag{9}$$

Finally, including all the above criteria, we define the quality ratio:

$$
\rho\_q = \rho\_{q,p} \cdot \rho\_{q,data} \cdot \rho\_{q,lrt}.\tag{10}
$$

Consequently, we obtain *ρ<sup>q</sup>* = 0 if *p* ≤ 0.1 or *R* <0. Conversely, *ρ<sup>q</sup>* = 1 in the case of *p* > 0.1, *Ndata*,*pl* = *Ndata*,*tot* and *nlrt* = 6, which is an unlikely but nevertheless possible scenario. Using this ranking, we can link the quality of a fit to a quantifiable value and describe the support for the (truncated) power law as illustrated in Table 3.

**Table 3.** Classification of the power law fit quality with respect to the quality ratio *ρq* used in our study.


The denominator value represents the total number of considered distributions (i.e., the power law and the six alternative distributions we compare it to). The lower limit of the 'moderate' classification corresponds to the case with *ρq*,*<sup>p</sup>* = 1, *ρq*,*data* = 0.1 and *ρq*,*lrt* = <sup>1</sup> <sup>7</sup>—i.e., at least 10% of the data is included in the fit and none of the alternative distributions is a statistically better fit than the power law. The upper limit considers the case with *<sup>ρ</sup>q*,*<sup>p</sup>* <sup>=</sup> 1 and *<sup>ρ</sup>q*,*data* · *<sup>ρ</sup>q*,*lrt* <sup>=</sup> 0.5 <sup>7</sup> —i.e., either the fit includes a high number of data or the power law is statistically a better fit than other distributions. Please note that *ρ<sup>q</sup>* multiplicatively combines standard power law fitting techniques [53–56] into a quantitative estimate of the quality of the (truncated) power law fit.

It is important to emphasize that even if the hypothesis of the data following the power law distribution is found to be plausible using the above statistical analysis, care needs to be taken when interpreting this observation. Firstly, there is still no guarantee that the data is in fact power law distributed and although our rigorous analysis includes several common distributions, other non-obvious distributions may prove to be a better fit. Secondly, the power law fit may be valid only for a small fraction of data. However, as the power law behavior is commonly found for a subset of data, namely at the tail of the distribution, the group that displays power law (i.e., scale-free) behavior includes individuals that stand out from the rest of the swarm by having features with values that are significantly above average. The way in which such individuals impact the global swarm performance remains an open question worth investigating.

#### 2.4.3. Correlation Measures

To examine the presence of correlations between the support for the power law distribution (i.e., the value of *ρq*) and the swarm performance it is important to use an appropriate correlation measure. One of the most prominent correlation measures is the *Pearson* correlation coefficient [58,59]. It evaluates the quality of a linear association between two distributions. In essence, it calculates the covariance of the mean values of two distributions, over the root of their standard deviations. It is closely related to linear regression and does not require the data to be normally distributed. Despite its mathematical simplicity it is an appropriate correlation measure for many distributions and, therefore, is widely used [60–63].

However, one could argue that the Pearson correlation coefficient is not ideal for skewed distributions with strong outliers. Popular alternatives are the *Spearman's rank* and the *Kendall's tau* correlation coefficients [62–65]. Both are based on generating ranked distributions by assigning a rank to each variable with respect to its value. The correlation coefficient is then given as a measure of the association between the two ranked distributions. Consequently, both correlation metrics are robust to outliers and suitable for non-linear distributions.

Although both correlation measures commonly return very similar results, Kendall's tau handles *ties* (i.e., cases in which there is no difference between the ranks) in a mathematically more straightforward way. More precisely, Kendall's tau returns the density difference between *concordant* and *discordant* pairs. Consider two vectors of length *n*, (*x*1, *x*2, ..., *xn*) and (*y*1, *y*2, ..., *yn*). Concordant pairs are pairs of data points that satisfy *sgn*(*xi* − *xj*)*sgn*(*yi* − *yj*) > 0 (where *sgn* (*z*) is a sign-function equal to +1 if *z* > 0, −1 if *z* < 0 and 0 if *z* = 0); similarly, discordant pairs satisfy *sgn*(*xi* − *xj*)*sgn*(*yi* − *yj*) < 0. Furthermore, ties are pairs for which *xi* = *xj* or *yi* = *yj*. Hence, with *nc* (*nd*) as the number of concordant (discordant) pairs, respectively, and *nx* (*ny*) as the number of ties in *x* (*y*), respectively, the Kendall's tau (also known as the Kendall's tau-b) is given by [66]:

$$
\pi\_{\text{Kendall}} = \frac{n\_{\text{c}} - n\_d}{\sqrt{n\_{\text{c}} + n\_d + n\_x}\sqrt{n\_{\text{c}} + n\_{\text{c}} + n\_y}} \tag{11}
$$

with

$$n\_{\mathbf{c}} = \sum\_{i,j} \delta\_{i,j}^{(c)}, \; n\_{\mathbf{d}} = \sum\_{i,j} \delta\_{i,j}^{(d)}, \; n\_{\mathbf{x}} = \sum\_{i,j} \delta\_{i,j}^{(x)}, \; n\_{\mathbf{y}} = \sum\_{i,j} \delta\_{i,j}^{(y)}, \tag{12}$$

where

$$\begin{aligned} \delta\_{i,j}^{(c)} &= \begin{cases} 1, & \text{if } \operatorname{sgn}(\mathbf{x}\_i - \mathbf{x}\_j)\operatorname{sgn}(y\_i - y\_j) > 0 \\ 0, & \text{else} \end{cases} \\ \delta\_{i,j}^{(\mathbf{x})} &= \begin{cases} 1, & \text{if } \mathbf{x}\_i = \mathbf{x}\_j \\ 0, & \text{else} \end{cases} \end{aligned} \tag{13}$$

and similarly, for *δ* (*d*) *<sup>i</sup>*,*<sup>j</sup>* and *δ* (*y*) *<sup>i</sup>*,*<sup>j</sup>* .

#### *2.5. Simulation Setup*

We designed and implemented a set of physics-based simulations using the state-of-the-art simulator for large-scale swarms, ARGoS [15]. An overview of all parameter values used in our simulations is given in Table 4. The simulations are conducted in a square-shaped arena, which is confined within four walls, each being of the length of 50 m. The arena is divided into two regions: (i) the nest *An*: it is the gray <sup>10</sup> <sup>×</sup> 50 m2 area in Figure 1a, and (ii) the foraging area *Af* : it is the white <sup>40</sup> <sup>×</sup> 50 m<sup>2</sup> area in Figure 1a.

The items are scattered uniformly over the foraging area, and keep reappearing after robots retrieve them to the nest—as in [14]—with constant probability. This prevents the system from drifting into an absorbing state in which there are no items left to recover.

**Figure 1.** Illustrations of the arena. (**a**) A snapshot from a simulation in ARGoS. Gray area: nest; white area: foraging field; black dots: items; blue objects: Footbots; light-blue lines: communication (range-and-bearing) links. (**b**) 3D view on the same arena. In both figures, the communication links are formed *only* for *resting* robots *inside* the nest, as in our experiments moving robots neither broadcast nor listen to any messages.


**Table 4.** Robot and arena parameters used for the simulation setup.

A phototaxis behavior is used to assist the robots in leaving and re-visiting the nest. For that purpose, light beacons are positioned equidistantly at the nest wall (yellow dots at the bottom of Figure 1a). Their light is perceived by the robots' light sensors. Each robot is programmed to move *away* from the beacons when it needs to *leave* the nest and *towards* the beacons when it needs to *return*. We use a homogeneous swarm of Footbots (see http://www.swarmanoid.org/swarmanoid\_hardware.php) in our simulations,

and the communication radius of the robots is set such that the fraction of the circular communication area around the robot is 0.982 % of the nest area, which is close to the fraction used in [14]. For better readability, we will limit our discussion of the robot states to only *resting* and *exploring*. While the former is distinct, the latter is composed of further states of which only *foraging* and *homing* are relevant because they are the most time consuming (for a detailed list of the robot states please see Supplementary Material Section 2).

At the beginning of the simulation, each robot switches from resting to exploring with a probability of 0.01. Consequently, within the first 500 time steps (ts) most robots leave the nest. After another ≈ 500 ts most of the swarm returns, with or without an item. Although this behavior is subsequently repeated several times, the number of simultaneously switching robots gradually decreases, and the switching rate from resting to exploring (or vice versa) approaches a constant limit. In most cases, the system approached an equilibrium after 5 · 103 ts (for the given arena, item density, and swarm size). Our measurements of the system features begin from that time instance on-wards and the experiment proceeds for another *T* = 104 ts.

Furthermore, to conduct a solid statistical study we use large-scale swarms with *Nrobots* = 950 units which is up to an order of magnitude higher than what is commonly used [14,16,36,42]. We selected the value of *Nrobots* by running preliminary experiments, in which we observed for this particular swarm size—under the given arena and item density—a maximum in swarm performance.

Finally, in our experiments, the most important means of influencing the swarm dynamics is by adjusting the numerical values of the internal and social cues—*ie*, *ir* and *se*,*sr*, respectively—at the start of each experiment. We consider a spectrum of 256 distinct scenario configurations, which differ by the 4-tuples *a* = (*se*, *ie*, *ir*,*sr*) drawn from:

$$\Omega := \{ a : \mathbf{s}\_{\varepsilon}, i\_{\varepsilon}, i\_{\varepsilon}, \mathbf{s}\_{\tau} \in \{ 0.0, 0.01, 0.5, 0.9 \} \}. \tag{14}$$

The rationale behind the choice of these parameter values is to include four fundamentally different kinds of cue impact on swarm dynamics: (i) none (ii) low (iii) intermediate and (iv) high. Please note that any additional value in the set *a* greatly increases the associated computational and analytic effort—as the number of scenarios scales with *dim*(*a*)4. However, based on preliminary results, additional values would offer potentially little informative gain (at the current stage) because the swarm dynamics would be similar to a mix of the dynamics generated by the above values.

#### **3. Results and Discussion**

We performed simulations with all combinations of cues and communication modes. Each simulation was repeated with 30 random seeds and the data analysis procedure was carried out as discussed in the previous section.

#### *3.1. Presence of Power Law Distributed Features*

The analysis of our simulation data shows that in most scenarios there was only weak or no statistical support for the (truncated) power law distribution (see Figure 2A). In particular, in roughly half of all scenarios no power distributed features were found. This observation suggests that in the present system, scale-free features are rare. Nevertheless, we found 245 + 71 = 316 (truncated) power law distributions with moderate or strong statistical plausibility for different features in various scenario configurations and for both communication modes, DCM and CCM. Thus, our findings are in line with a recent study showing that scale-free networks may occur rarely but across different areas [55].

As the scatter plots in Figure 2 show, most of the distributions with weak or moderate support for power law are concentrated below or close to the average values of swarm performance while the distributions with strong support for power law are associated with above-average performance in terms of *Ncoll* and *ωca*. Swarm performance was measured using (i) the number of items retrieved by the robots,

*Ncoll*, (ii) the average number of collected items per time spent on collision avoidance *ωca*, and (iii) the average number of collected items per time spent exploring *ωe*. Figure 3 shows the values recorded for these three metrics under both continuous and DCMs and for the entire range of 256 scenario (cues) configurations, respectively. Repeating performance patterns can be observed over different sets of configurations. The regions over which these patterns emerge are (from left to right): (i) all scenarios with *se* = 0 and *ie* = 0, i.e., constant *pr*→*<sup>e</sup>* (blue region with a left tilted mesh in Figure 3), (ii) all scenarios with *se* = 0 and *ie* > 0, i.e., no social and only internal influence on *pr*→*e*. This region is henceforth referred to as *NSe* (shown in orange, no mesh), (iii) all scenarios with *se* = 0.01, i.e., low social impact on *pr*→*<sup>e</sup>* (green region with vertical mesh, henceforth denoted as *LSe*), and (iv) all scenarios with *se* = 0.5 or *se* = 0.9, i.e., high social influence on *pr*→*<sup>e</sup>* (red region with right tilted mesh, henceforth denoted as *HSe*). Please note that in all four regions *pe*→*<sup>r</sup>* is altered in the same way, i.e., for *sr* and *ir* all values from {0.0, 0.01, 0.5, 0.9} are included. The best swarm performance in terms of *Ncoll* and *ωca* emerges when the influence of internal cues on the swarm dynamics is negligible compared to social cues, i.e., when *se* ∑*<sup>t</sup>* |*δη*(*t*)| *ie* ∑*<sup>t</sup>* |*δφ*(*t*)| and *sr* ∑*<sup>t</sup>* |*δη*(*t*)| *ir* ∑*<sup>t</sup>* |*δφ*(*t*)|.

**Figure 2.** (**a**) Feature data sets obtained from simulations, sorted by the type of statistical support for a corresponding power law fit. The classifications follow Table 3. (**b**–**d**) Log-linear scatter plots relating the power law fit quality ratio *ρ<sup>q</sup>* to the swarm performance in terms of *Ncoll*, *ωca* and *ωe*, respectively. The vertical dashed lines indicate the mean performance values while the horizontal dashed lines separate the quality categorizations taken from Table 3.

**Figure 3.** Swarm performance in terms of (**a**,**b**) *Ncoll*; (**c**,**d**) *ωca* and (**e**,**f**) *ωe*, respectively. For each performance measure, 256 scenario configurations were implemented (i.e., with all cue values from Equation (14)), using one of two communication modes: DCM (left) and CCM (right). The x-axis represents the IDs of the scenario configurations. The colors and the mesh patterns highlight regions that display different dynamics. Apart from (**f**), in all plots the red dots mark the scenarios in which the feature mentioned in the inset demonstrated a high value of *ρq*, i.e., there was a strong support for the distribution to be power law. In (**f**), the red dots mark the scenarios with moderate support. See Supplementary Material Section 3 for combined plots of *Ncoll* and *d* distribution in CCM over the complete set of 256 scenario configurations.

The best performance levels in terms of *Ncoll* and *ωca* were reached over the *LSe* region. For instance, the maxima of *Ncoll* and *ωca* correspond to the scenario configurations in which *se* = 0.01, *ir* = 0 and *sr* ≥ 0.5. For the same configurations, (truncated) power law distributions of space features were found in the CCM (examples shown in Figure 4). Contrary to CCM, in the DCM the robot interactions are interrupted. These interruptions may explain why, in DCM, space features such as communication degree tend to not follow a power law distribution (weak overall support for the presence of a power law behavior). Nevertheless, we found fits with moderate to strong support for (truncated) power law to time features, such as *τ<sup>r</sup>* and *τca*, demonstrated in Figure 5. The best power law fits of the DCM correspond to the peaks in swarm performance in terms of *Ncoll* and *ωca* over the *HSe* regions.

**Figure 4.** Log-log scale plots of the degree *d* (top) and the critical degree *dc*,*rec* (bottom) distributions in CCM. The black lines represent the corresponding truncated power law fits. The insets show the fit parameters as well as the scenario configurations. The plots (a) and (b) differ by the scenario configurations shown in the insets; similarly for (c) and (d). These scenarios are among the top five swarm performances with respect to *Ncoll*. Please note that *λxmin* is relatively small, i.e., power law is a good fit for *x* close to *xmin*. See Supplementary Material Section 3 for plots of *d* in CCM over all 256 scenario configurations.

**Figure 5.** Log-log scale plots of the resting time *τr* (top) and the collision avoidance time *τca* (bottom) distributions in DCM. The black lines represent the corresponding truncated power law fits. The insets show the fit parameters as well as the scenario configurations. The plots (**a**,**b**) differ by the scenario configurations shown in the insets; similarly for (**c**,**d**). These scenarios belong to a subset of the best swarm performance with respect to *ωca*. Please note that *λ* = 0 for the fits of *τca*, indicating better support for the power law fit than for truncated power law.

The third performance measure, i.e., *ωe*, reached its best values over the *NSe* region. Its maxima correspond to cases where *se* = 0 and *sr* = 0. Interestingly, for these scenario configurations we found fits with moderate to strong support for (truncated) power law to the data of Δ*d*, i.e., the change of the average communication degree of the robot (examples shown in Figure 6). This is an interesting finding because it indicates that a communication feature may be power law distributed also in those scenarios in which the swarm tries to minimize the number of foraging robots and maximize the number of resting ones. Moreover, in most Δ*d* distributions with strong or moderate support for the power law, the fit includes only 10–20% of data points. The reason for the relatively low ratio of power law fitted data is that the tail of the distribution is likely to represent by the fraction of robots that rest or move close to the border between the nest and the foraging area.

In general, the findings suggest that internal cues (in the absence of social cues) keep robots at the edge of minimal activity while social cues (in the absence of internal cues) drive the robots towards maximal activity.

**Figure 6.** Log-log scale plots of Δ*d* per 100 ts in CCM. The black lines represent the corresponding truncated power law fits. The insets show the fit parameters as well as the scenario configurations. The plots (**a**,**b**) differ by the scenario configurations shown in the insets. These scenarios are among the best swarm performances with respect to *ωe*.

#### *3.2. Correlation with Swarm Performance*

In the previous section we have illustrated that swarm performance is likely to reach its peaks over cue configurations that include asymptotically scale-free space or time features (see Figure 3). In this section, we analyze this observation statistically, using correlation measures such as the Pearson and Kendall's tau rank correlation coefficients introduced in Section 2.4.3. However, note that both correlation measures have strengths and shortcomings. On the one hand, while the Pearson correlation coefficient is widely used and has an elegant mathematical form, it is sensible to outliers and may not be appropriate for non-linear distributions. On the other hand, Kendall's tau is suitable for non-linear distributions as well as robust to outliers. Nevertheless, reducing the values to ranks may disregard the significance of the variable's value being far from the average. In particular, replacing the real value of the quality ratio *ρ<sup>q</sup>* by its rank leads to loss of information about the extent to which *ρ<sup>q</sup>* represents the quality of the power law distribution. Moreover, following the definition of Kendall's tau in Equations (11)–(15), each difference between data point pairs is assigned the same weight which may not always be appropriate. For instance, consider the ranked swarm performance in terms of *Ncoll* and the corresponding distribution of *ρ<sup>q</sup>* in CCM for *d* in Figure 7a and for *dc*,*sent* in Figure 7b, respectively. In both cases, Kendall's tau defined by Equations (11)–(15) returns values indicating no correlation (i.e., *τKendall* = 0.02 and *τKendall* = −0.04, respectively). However, as evident in Figure 7, both cases show different dynamics, with *ρ<sup>q</sup>* for *d* following *Ncoll* more closely than for *dc*,*sent*. The main reason is that the dominant fluctuations of *ρ<sup>q</sup>* close to zero are assigned the same weight (i.e., rank step 1) as the more permanent increase of *ρ<sup>q</sup>* for high values of *Ncoll*. Similar considerations hold for the other features and the DCM. To account for this type of behavior, we use a generalization of Equation (15) that weights the ranking steps by a parameter *κ*, which is relative to the average change, such that:

$$\begin{aligned} \delta\_{i,j}^{(c)} &= \begin{cases} \kappa, & \text{if } \operatorname{sgn}(\mathbf{x}\_i - \mathbf{x}\_j)\operatorname{sgn}(y\_i - y\_j) > 0 \\ 0, & \text{else} \end{cases} \\ \delta\_{i,j}^{(\alpha)} &= \begin{cases} \kappa, & \text{if } \mathbf{x}\_i = \mathbf{x}\_j \\ 0, & \text{else} \end{cases} \end{aligned} \tag{15}$$

and similarly, for *δ* (*d*) *<sup>i</sup>*,*<sup>j</sup>* and *δ* (*y*) *<sup>i</sup>*,*<sup>j</sup>* . The weight parameter *κ* is given by

$$\kappa = \frac{1}{2} \left( \frac{|\mathbf{x}\_i - \mathbf{x}\_j|}{\mu\_x} + \frac{|y\_i - y\_j|}{\mu\_y} \right),\tag{16}$$

where *μ<sup>x</sup>* and *μ<sup>y</sup>* are averages over all |*xi* − *xj*| and |*yi* − *yj*|, respectively. For each *i*, *j* pair, Equation (16) considers the data distances of both distributions, normalized by their respective mean distances. Consequently, *κ* does not favor any distribution and weights each ranking step relative to other distances. Please note that in general, there is no correlation metric that is perfectly adequate for all types of studies and data distributions; it is thus common to consider appropriate modifications [67–70]. In the present case, for *κ* = 1 we obtain the standard Kendall's tau rank correlation coefficient described in Section 2.4.3. However, by implementing Equation (16), the correlation coefficient is less sensitive to fluctuations than the standard Kendall's tau, while still being more robust to outliers and non-linearity than the Pearson correlation measure. Therefore, in the following we will use this modified Kendall's tau rank correlation coefficient to investigate the presence of correlations between *ρ<sup>q</sup>* and swarm performance.

**Figure 7.** Ranked distribution of *Ncoll* (dark red, left y-axis). For the same cue configurations, the CCM distributions of *ρ<sup>q</sup>* (right y-axis) for (**a**) *d* and (**b**) *dc*,*sent* are shown in blue, respectively. The insets depict the corresponding scatter plots with data points representing weak (circles), moderate (triangles) and strong (squares) support for power law distribution; gray lines indicate the onsets of the different support classifications (similar to Figure 2).

The correlations are shown for all features in Table 5 between the three measures of the swarm performance and the feature 'scale-freeness' quantified by *ρq*. We found strong correlations of the scale-free property of various features with the swarm performance. In particular, high correlations exist for *τca*, *τr*, Δ*d* in DCM; and, additionally, for *d*, *dc*,*rec* in CCM. Remarkably, for those features for which we found moderate or high correlation values (highlighted in blue in Table 5), most high-quality power law fits appear in the same scenarios as the highest swarm performance peaks. The red dots in Figure 3 illustrate this finding by highlighting the scenarios in which the quality ratio is *ρ<sup>q</sup>* > 0.5 <sup>7</sup> . Moreover, the swarm tends to demonstrate low performance with respect to *Ncoll* and *ωca* for those scenarios in which *ω<sup>e</sup>* is highest, the latter being well correlated with Δ*d*.


**Table 5.** Correlation coefficients quantifying correlations of *ρ<sup>q</sup>* with *Ncoll*, *ωca* or *ωe*.

A high correlation coefficient means: the better the quality of (truncated) power law distribution, the higher the likelihood that the swarm performed well. Cells highlighted in blue show moderate or strong correlations.

The correlation coefficients confirm the observation, supported by the data shown in Figures 2 and 3, that most power law distributions with strong support (i.e., high *ρq*) appear in scenarios with peak swarm performance. To further examine this observation, we consider the correlations of the swarm performance with different *ρ<sup>q</sup>* support classifications (based on Table 3). As Table 6 shows, there are moderate and strong positive correlations between features with strong support for power law distribution and swarm performance in terms of *Ncoll* and *ωca* for both communication modes. This suggests that the observation of scale-free features is more likely in scenarios in which the agents are more successful in retrieving a high number of food items.

**Table 6.** Correlation coefficients between *ρ<sup>q</sup>* and *Ncoll*, *ωca* or *ω<sup>e</sup>* for different power law support classifications.


Correlation of the swarm performance with different categories of power law distribution support. Cells highlighted in blue show moderate or strong correlation values.

#### *3.3. The Role of Feedback Mechanisms in the Emergence of Scale-Free Features*

An attribute of complex systems that is widely known to support the emergence of scale-free characteristics is the presence of (positive and negative) feedback loops [1,53,71]. We specify the feedback effect to be positive or negative based on the individual response to the information input from its neighborhood. Hence, we refer to the feedback mechanism as *positive* feedback if it pushes the individuals to the same state as the state of the majority, whereas *negative* feedback pushes them away from it.

Most scale-free features were found in scenarios in which (i) the robot behavior was dominated by social interactions, (ii) the swarm attempted to balance positive and negative feedback and (iii) the swarm displayed a tendency towards active exploration. In particular, in CCM, the first 17 scenarios sorted by *ρ<sup>q</sup>* in descending order were found over the *LSe* region and with *ir* = 0. Similarly, in DCM, the first 28 scenarios were found over the *HSe* region and with *ir* = 0. To understand this, it is necessary to consider in more detail the impact of each cue on swarm dynamics and the feedback mechanisms.

For conciseness, we focus our system analysis on CCM and its most relevant set of parameter configurations. Similar conclusions hold for DCM. In particular, we can simplify our analysis based on the repeating patterns of swarm performance (see Figure 3) and the following observations: (i) The swarm performance is qualitatively very similar between the cue values 0.5 and 0.9 (for all four cues). Thus, we focus, in the following, on {*a* : *se*, *ie*, *ir*,*sr* ∈ {0.0, 0.01, 0.5}}. (ii) The cue *ie* has a negligible impact on the foraging dynamics when *se* > 0. By neglecting scenarios in which *ie* = 0, except those with *se* = 0, we can further shorten the set of relevant scenarios. Finally, (iii) there are significant differences in the dynamics between scenario configurations with *ir* = 0 and those with *ir* > 0 but negligible differences between *ir* = 0.01 and *ir* = 0.5, 0.9. Thus, we focus, in the following, only on scenarios with either *ir* = 0 or *ir* = 0.01. Figure 8 shows the final set of 24 scenarios relevant to the discussion below.

**Figure 8.** Number of collected items for a selected set of 24 scenario configurations *a* in CCM. The data labels show the corresponding cue values of *a* = (*se*, *ie*, *ir*,*sr*).

Please note that (ii) and (iii) are consequences of the internal cues *ie* and *ir* acting only on *exploring* robots. In the exploring state, the crucial parameter is *pe*→*<sup>r</sup>* because it defines the probability to stop exploring and change to resting. A non-zero value of the internal cue *ir* has a substantial impact on dynamics as it alters *pe*→*<sup>r</sup>* after each exploration attempt. As the likelihood of finding and retrieving a food item is low, *ir* mostly reduces *pe*→*r*. The more *pe*→*<sup>r</sup>* is lowered by *ir*, the less likely the robot is to find a food item during the next exploration attempt. Thus, *ir* has a strong inhibitory influence on the swarm's exploration activity. Consequently, there is a significant difference in swarm performance between the scenario configurations with *ir* = 0 and those with *ir* > 0. As the swarm actively attempts to explore the environment and collect food items, the influence of *ir* > 0 can be considered an important driver of negative feedback. In contrast, *pr*→*<sup>e</sup>* acts only on *resting* robots. Consequently, any change of *pr*→*<sup>e</sup>* through *ie* is easily distorted by *se*, i.e., the social interactions with the neighborhood of the resting robot. Hence,

only when *se* = 0 there is an inhibitory impact of *ie* on swarm dynamics (similar to *ir*, due to the scarcity of food items), otherwise *ie* is negligible. In short, when the probability of finding a food item is low, with *ir* = 0 and *se* > 0 enabling the swarm to significantly damp the feedback mechanisms that drive the swarm towards inactivity.

Next, we consider the particular contributions of social cues *se* and *sr*. Social interactions represent a direct form of feedback loops, enabling the swarm to drift towards an absorbing state (e.g., uninterrupted resting or exploring) or maintain a balance between positive and negative feedback. In general, note that high values of *se* often lead to *pr*→*<sup>e</sup>* = 0 due to the high probability of encountering a robot with a failed exploration (due to the low density of food items). With such relatively high values of *se*, *pr*→*<sup>e</sup>* can be reduced to zero within a few time steps. By contrast, with *se* = 0.01, *pr*→*<sup>e</sup>* does not fluctuate as strongly. Similar considerations hold for *sr* and *pe*→*r*. In terms of active exploration, i.e., long exploration times and high number of retrieved items, it is beneficial for the swarm to have robots with *pr*→*<sup>e</sup>* = 1 and *pe*→*<sup>r</sup>* = 0. Indeed, in the present system we observe that the swarm approaches such behavior for *sr* = 0.5 and *se* = 0.01 (i.e., over the *LSe* region). More importantly, such balance of *sr* and *se* allows positive and negative feedback loops to coexist with the positive feedback being slightly more dominant. Due to this feedback coexistence, a robot that happened to be surrounded by unsuccessful neighbors will tend to have low *pr*→*<sup>e</sup>* and high *pe*→*r*, i.e., its resting time *τ<sup>r</sup>* will increase (together with its *d* or *dc*,*rec*) and vice versa. Over time, such dynamics will result in robots that are increasingly inactive (with increasingly higher *τr*, *d* or *dc*,*rec*) and robots that are increasingly active (with increasingly lower *τr*, *d* or *dc*,*rec*). When the majority tends towards active exploration, the inactive group of robots experiences negative feedback and while the active group is subject to positive feedback. The prevalence of the positive feedback decreases the number of consistently resting robots significantly below the number of consistently exploring ones. Ultimately, this leads to skewed or heavy tailed distributions, such as the power law and, consequently, to the emergence of scale-free features. Similar considerations apply to the DCM over the *HSe* region. The difference is that in DCM each robot can broadcast its exploration result only once. Thus, *se* needs to have high values for dynamics similar to CCM to emerge.

To illustrate the above considerations, let us examine the scenario configuration of CCM with *se* = 0.01, *ie* = 0.9, *ir* = 0.0, and *sr* = 0.5 (see Figure 4a), in which a high performance value of *Ncoll* was observed (i.e., this scenario is similar to the peak in Figure 8). The value of *sr* has a high impact on *pe*→*<sup>r</sup>* (the probability to transition to resting). For example, if a robot receives at least two 'success' messages more than 'failure' messages—i.e., if *δη*(*tr*) ≥ 2 in Equation (2)—its *pe*→*<sup>r</sup>* drops to zero. When *pe*→*<sup>r</sup>* = 0 and *ir* = 0.0, the robot will stop exploring only if it finds an item. During the subsequent resting, this robot is likely to cause one of its neighbors to reach *pe*→*<sup>r</sup>* = 0, which repeats an analogous cycle of events. The corresponding dynamics can be translated in terms of the positive feedback pushing the robots out of the nest and increasing the number of robots in the foraging area (i.e., in the exploring state). In the long term, due to the positive feedback, the swarm drifts towards the absorbing state in which all robots have *pe*→*<sup>r</sup>* = 0 and *pr*→*<sup>e</sup>* = 1.0. In the short term, while most robots is exploring, some robots remain in the nest, e.g., due to crowding at the entrance of the nest. Those robots have a higher number of neighbors because the nest is significantly smaller than the foraging area. Therefore, during this crowding behavior, the swarm experiences the coexistence of positive and negative feedback loops. A specific balance between these feedback loops may lead to the emergence of scale-free features such as the space feature *d* (for which, indeed, the above mentioned scenario configuration has one of the best truncated power law fits with *ρ<sup>q</sup>* ≈ 0.23, shown in Figure 4a). Similar considerations hold for other CCM examples presented in Figure 4 or the DCM examples in Figure 5.

The above example demonstrates positive feedback regarding the *exploring state*. However, under some configurations, positive feedback can also be observed around the *resting state*. For example, during the crowding behavior in the nest, a robot which is surrounded by a high number of resting

neighbors is likely to get 'stuck' and be unable to leave the nest. This robot will eventually switch to the resting state and broadcast a 'failure' message. Neighbors that receive this message will decrease their probability to explore *pr*→*e*, and, through physical interference, lower the neighboring robots' chances to leave the nest. Consequently, positive feedback at the resting state may lead, in the long term, to an increase of the average communication degree of the resting robots and the emergence of power law distributed features, alongside the occurrence of outstanding robots whose features such as Δ*d* (examples shown in Figure 6), *τ<sup>r</sup>* or *τca* exhibit exceptionally above-average values.

#### **4. Conclusions**

In this paper, we have investigated the interplay between scale-free features and swarm dynamics of a foraging swarm. Our results demonstrate that in the studied system (i) several space and time features tend to be asymptotically scale-free for multiple parameter configurations; (ii) the emergence of scale-free features can be attributed to the presence of positive/negative feedback mechanisms. Furthermore, (iii) in several cases the swarm performance is moderately or strongly correlated with the tendency of space and time features to follow the power law distribution—which is the mathematical backbone of the scale-free property.

This study serves as a first step towards a better understanding of the interplay between the presence of scale-free features and the swarm behavior in terms of collective performance. Although our results do not indicate a causal relationship, we found conclusive evidence for a close connection between scale-free features and swarm performance. Moreover, our analysis of power law distributed features shows a strong link between the microscopic behavior of robots determined by specific cues and the macroscopic behavior of the entire swarm exhibiting peak performance. However, care needs to be taken when considering cases where the power law fit includes only a small fraction of data, as focusing on a small subgroup that plausibly displays scale-free features may disregard a significant piece of information about the global swarm behavior.

Please note that the presented exploratory study was conducted with an emphasis on *whether* scale-free features may emerge autonomously in artificial multi-agent systems, without focusing on *why* they do so. Hence, more sophisticated work is needed to precisely understand the exact causes for the emergent scale-free characteristics in our systems. For instance, strong feedback mechanisms may push the system close to a critical point at which a phase-transition occurs. In case of a *continuous* phase-transition, the latter is known to be associated with the emergence of scale-free features [1,53]. In fact, using approximations (such as the assumption of a well-mixed system) it could be shown analytically that the social or internal cues can be used as control parameters, moving the system between its phases (e.g., phases in which the number of resting robots is minimized or maximized). However, if the system approaching a phase-transition is the cause for the emergence of scale-free features in our experiments, we expect to find a correlation length (i.e., the distance over which one robot influences another, directly or indirectly) that is longer than the size of the system (e.g., the length of the nest). In contrast, our preliminary analysis indicated the opposite behavior: as we approached those scenarios in which scale-free characteristics were observed, the correlation length decreased below the system size. In general, a detailed finite-size-scaling analysis is necessary to explain our findings more thoroughly as well as reveal which (physical) boundaries are most relevant and what impact they have on the system dynamics.

The canonical foraging task continues drawing scientific attention due its importance and prevalence in nature as well as artificial systems. In addition, the complexity of collective foraging as a combination of several sub-behaviors allows the modeling and analysis of a large number of scenarios and examine various features. For these reasons, we focused exclusively on the foraging behavior. However, it would be interesting to extend the scope and investigate other multi-agent tasks, such as aggregation or flocking,

from the same perspective as in our study thereby broadening the understanding of scale-free phenomena in artificial collective systems.

**Supplementary Materials:** The following are available at http://www.mdpi.com/2076-3417/9/13/2667/s1.

**Author Contributions:** Conceptualization, Y.K.; Formal analysis, I.R.; Funding acquisition, P.S.; Investigation, I.R.; Methodology, I.R. and Y.K.; Project administration, P.S.; Resources, P.S.; Supervision, Y.K. and P.S.; Validation, I.R.; Visualization, I.R.; Writing—original draft, I.R.; Writing—review and editing, I.R., Y.K. and P.S.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**





c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Situated Psychological Agents: A Methodology for Educational Games**

#### **Michela Ponticorvo 1,\*, Elena Dell'Aquila 2, Davide Marocco <sup>1</sup> and Orazio Miglino 1,3**


Received: 4 October 2019; Accepted: 12 November 2019; Published: 14 November 2019

**Featured Application: The present paper introduces a methodology based on situated psychological agents that can be fruitfully applied to design and implement educational games, as it permits to represent the flows inside the game on the educational, psychological, and pedagogical level while detailing agents' features at a psychological level.**

**Abstract:** In recent years, the ever-increasing need for valid and effective training to acquire competences in multiform contexts has led to a wide diffusion of educational games (EG). In spite of their diffusion, there is still a need to reflect on the design process that should embed the games' pedagogical potential and the instructional process in the entertainment scope. Moreover, as building EG, especially in digital environments, is an enterprise that involves specialists with different expertise, it can be useful to have a shared methodology that is easily understandable and usable by many users. In this paper, we propose to use situated psychological agents (SPA) as a methodology to design and build effective EG and show how to represent games in terms of SPA and their interactions by diagrams and describe different examples of how this approach has been applied.

**Keywords:** educational games; game design; situated psychological agents; education; competences

#### **1. Introduction**

Education is a key step and challenge in every society as successfully preparing future citizens in terms of knowledge, skills, and competences strongly affects the competitiveness at an individual and collective level. Not by chance, European Commission is tracing indications to strengthen human capital [1] so that everyone may have a key set of competencies that allow personal fulfillment and include transversal skills, such as digital competence and entrepreneurship competence.

Competences are more easily acquired through pedagogical models that favor the active involvement of the learner in the acquisition process [2–4] thus, opening the way to innovative educational strategies, including educational games (EG) [5–8]. Moreover, the introduction of information and communication technologies has led to a revolution in education concerning different aspects, for example, the tools that can be used for education, the places where education can happen, the possibility to interact with an incomparable higher amount of learning resources and educational figures. ICT has brought to the evolution of new approaches such as technology-enhanced learning and game-based learning [9,10].

It means, for example, that a learner, child or adult, can now access not only books, one of the most used learning sources for a very long time, but also additional multimedia contents, simulations, and social media to obtain information. The world where education takes place is no more limited

to the physical classroom but is expanded to cover potentially an unlimited world, in terms of space and time, that is totally or partially virtual. If we consider the tools that have been introduced in the educational context, it seems that, in spite of their wide diffusion, there is still a need to reflect on the methodology to design, implement, and use them in a learning scenario.

In more detail, it can be a useful reflection on the methodology to design and implement educational games, as an effective methodology should permit to have a shared formal representation of the main game elements, of their connections and their interactions.

In what follows, we will propose a methodology that meets this requirement, which is based on the situated psychological agents (SPA) approach, connecting it to the EG design process.

#### **2. Educational and Serious Games: The Design Process**

Serious games are games that educate, train, and inform, to use the title of the highly cited paper by Michael and Chen [5]. Since the first book by Abt [11], which referred to card and board games, serious games have become digital and have strongly affirmed their educational potentials [12,13]. This fact forces to critically reflect on this kind of games, as they intervene in the learning process in a way that has not been faced yet in traditional learning theory and typical game theory as well. Many remarkable theoretical frameworks have been proposed, also recently, to satisfy this need. Among these, it is useful to cite the learning mechanics-game mechanics model [14], which draws a set of pre-defined game mechanics and pedagogical elements abstracted from the literature and connects them to identify the main pedagogical and entertainment features of a game. This work highlights the fundamental game mechanics and how they are translated into learning mechanics. Also noteworthy is the activity theory-based model of serious games [15] that provides a useful representation of EG, with game elements, their connections and their contribution to pedagogical goals achievement. These works have the value of trying to answer the question of how the concrete components of the game should be structured to support learning and which elements are crucial to address the design process.

The design process, indeed, is extremely important in determining EG success.

The starting point of each designing enterprise is to clarify what is the goal to be reached. In the case of games, the questions are "What is the goal of the game? What the player can do?". In the case of entertainment games, some typical objectives are:

**Erase**: The player must "eliminate" the opponent, such as chess and checkers game; **Solve**: The player must find a solution to a puzzle or answer a question: Examples are Cluedo or Trivial Pursuit;

**Chase**: The player runs towards or away from someone or something. One famous example is the video game Super Mario Bros;

**Build:** The player has to build something: A house, a city, an empire, as in The Sims, Civilization, and Age of Empires.

On the other hand, if our goal is to build EG, where the educational aspect is crucial, the designers have to specify also "What is the learning goal of the game? How it can be achieved by actively involving the player?" In this case, help can come from learning theory. For example, Bloom's updated taxonomy [16,17], which is commonly used to describe learning goals and includes remembering, understanding, applying, analyzing, evaluating, and creating, can help focalize what the EG aims at leading the player to. After the learning and game objectives are defined, the design must define three fundamental and interconnected levels: The shell level, the core level, and the educational level. It represents a multi-level approach for design, where there are two concentric levels, the shell, and core level and a ubiquitous one, the educational level, also named evaluation and tutoring level [2]. The level interconnections are represented in Figure 1.

**Figure 1.** A multi-level approach for game design with shell, core, and educational level. Shell and core levels can be found in every kind of game, whereas the educational level characterizes educational games.

#### *2.1. The Shell (Game Narrative) and Core (Game Mechanics) Levels*

The shell and the core levels are present in every game, and in almost every cultural product as well. The shell level represents the visible content that is immediately accessible to the player. It frames the game dynamics within the core level. The educational level, even if it is present in many entertainment games, is explicitly characterized in educational and serious games, as it allows, on the teacher's side, to understand if and how the player/learner has acquired the concepts conveyed by the EG and, in some cases, directly intervenes in the learning process.

At the shell level, superficial and visible, we find the game narrative. EGs, like many other cultural products, are expressed through a narrative metaphor. It is, therefore, important to define who are the characters, what actions they can perform, what interactions are possible between characters, and the environment within which those actions take place. If we adopt a theatrical jargon, the plot, the scenario, the roles, the setting are aspects to be defined.

It is widely recognized that narration is a key aspect in human cognition [18], and it is, therefore, possible to find it in a wide variety of cultural products, such as fairy tales, movies, news, to cite some. Games, as cultural products, share this feature and then narration is present in games too [19]. As an example, the characters can be two armies in the chess game, the scenario can be a futuristic world in a videogame, the plot can be an interaction between relatives in a role-playing game. In the well-known game of Monopoly, for example, a pair of dice are rolled to move a player's piece around the board. Buying and trading properties mean to represent real estate trading that strongly helps to engage the player in the negotiation. The shell level, where narrative resides, keeps a hidden level with specific mechanisms and rules: This hidden level is called the core. Adopting a term that is commonly used in the context of videogame creation and development, this deep level is the game engine [20]. The game engine allows implementing core functionalities related to game dynamics, for example, related to physics, animation, artificial intelligence, etc.

These levels interact: One level can have strong effects on the other. The narrative provides a framework where the hidden content lives, as it was in a shell, as suggested by the name.

In the context of EG, the shell level is essential to provide a semantic context to the educational activities, whereas the core level is related to skills, abilities, competences to be transferred, and to the relevant learning objectives. It is interesting to note, as hinted at before, that the concepts of core and shell levels are present in every kind of game, not just in educational games.

#### *2.2. The Educational Level*

In EG, a relevant role is played by the educational level, which includes evaluation and tutoring activities, with the explicit educational goal to allow students to accomplish specific learning outcomes. It is, therefore, important to pay attention to the design process of such a key function. Together with this, all design decisions at all levels should be harmonized in order to provide a meaningful learning tool, as shown in Figure 1.

At the educational level, the evaluation function analyzes players' game performances relative to the specified training objectives and provides the players and the trainer with important information and data about the learning process. At this level, we find learning analytics, which is the measurement, collection, analysis, and reporting of data about learners, intending to improve the learning process as well as the environment in which it occurs. Despite some challenges that can derive from the effort of introducing learning analytics in EG, nonetheless, studies report that this effort can be useful to achieve greater effectiveness and measurements of progress in learning [21–23]. Paraphrasing Siemens's words [24], learning analytics is the use of intelligent, learner produced data, and analysis models to discover information and social connections for predicting and advising people's learning. From the teacher's perspective, this level is fundamental because it supplies specific tools and functions to support the training process.

#### **3. Agents in EG and the SPA Approach**

The ubiquitous presence of interacting artificial and real actors at each level, together with the importance of the narrative, recalls the theatrical metaphor already presented for the shell level. From the educational point of view, this metaphor is extremely powerful to represent interactions between the various actors of the educational process in EG. However, the theatrical metaphor effectively applies to all kinds of educational games only when agent's conception and design is based and inspired to psychological models, as they ultimately make choices, take decisions, and act within the environment they live in [2].

Indeed, the various actors populating the different stages of an EG, at the shell level (users, learners), core level (interactions between actors), and educational level (trainers, educators, tutors) can be represented as agents with different features and functions. If we think of EG, it is evident that the people involved in the learning process are a key element both on the educational and game side. Almost every educational situation is characterized by interactions between the learner, at the center, and the people involved in the whole educational processes, both in formal (teachers, educational designers, tutors etc.) [25] and informal contexts (parents, peers, etc.) [26–28]. Nevertheless, the kinds of interactions that specify the educational settings can be varied and show specific nuance that every methodology aiming at modeling the educational process should take into account.

By looking at Figures 2 and 3 we can see two different implementations of an educational process. The first one is usually observable in children who learn with a teacher through multisensory experience. It is characterized by well structured educational materials [29–31], e.g., a Montessori-inspired classroom [32] and a well-structured environment. In this case, the teacher can be modeled as an agent that we can call generically trainer and directly affects learners' activities. In this view, the environment within which the learner acts as the playground of the learning process. The trainer provides external guidance and support during the play, thus allowing a full understanding at the cognitive level.

**Figure 2.** Learning interactions in a Montessori-type environment.

**Figure 3.** Moreno role-playing games.

The second situation comes from Moreno's role-playing games [33] where the learner acts exactly as an actor, by evolving the scene on the stage according to a given script.

Role-playing games simulate a social situation in which users are asked to cover and interpret specific roles to develop a certain competence, such as effective communication or negotiation [34–38]. Here, learners, as actors, according to a specific script, perform and develop their actions on a stage. In this way, the stage represents the playground where the learning takes place [29]. Behind the stage, the psychologist, the trainer, or the observers, which all can be seen as agents interacting with the learner, can provide guidance and support, affecting the learning environment, though not directly intervening on the playground.

These interactions always happen inside the game and can be partially or completely virtual if the agents are ruled by artificial intelligence [39–41].

However, the relevant elements of these learning situations are useful to define a more general methodology:

The playground: A space (physical and/or conceptual) that delimitates the actions of one or more learners. The playground is defined by the narrative structure. It can contain objects (physical and/or conceptual) that can be manipulated by the learner.

Learners: The learners can act in the playground, changing its state directly. They can be considered agents that are situated, immersed, in a scene of the play, and can select autonomously the actions that modify the playground in the function of their psychology, including cognition, emotion, etc.

Trainers: Teachers or people who have educational, training, or assessment functions affect directly or indirectly the learners but cannot modify the playground state.

In the design process of EG, it is, therefore, necessary to keep in mind the following elements:


Learners and other characters present at the shell level, and therefore, belonging to the narrative of the game, can be called on-stage agents (OSA), as they directly interact and affect the core level according to their specific endowment. The BSA can interact with the game indirectly, by affecting OSA actions and are mainly present at the educational level. This distinction was firstly introduced by Dell'Aquila and colleagues [2] and Ponticorvo and colleagues [29]. Moreover, it can be also adopted at two different levels: 1) The first, related to the educational material (in EG interacting elements can be conceptualized in the form of agents), 2) and the second, to the learning scenario, where learners are conceived as agents interacting with other agents (real or artificial), thus defining the educational environment.

Taken together, the description of the design process and the focus on interacting agents and playground, are the main elements of the SPA approach for educational games, which allows addressing the EG design both at a high level of abstraction and at a high level of detail.

Therefore, the SPA are agents with different characteristics: OSA can directly act on the playground or BSA if they externally interact with the OSAs within a well-defined educational process. They are situated, as they are present and somehow "immersed" in the educational process, being in the playground or in the overall narrative structure. They are psychological, as they are endowed with cognitive and emotional features: In the case of human agents, it is automatic that agents have a psychological characterization. In the case of AI-controlled agents, it is possible to take inspiration from psychological theories and models to define their psychological characteristic and behaviors. It is useful to underline that the agents share the same context. Thus, there is a shared meaning between the actors involved in the learning situation.

The SPA approach, at the shell level, identifies the game characters, their characteristics, and interactions. At the core level, each agent is accurately defined as to its sensory and action endowments, i.e., what the agent perceives and what action it can perform. These actions must follow the game rules that are defined both by setting constraints and by the agent actions defined in the core level itself.

SPA can be useful for EG design because all the interacting entities within the game can be represented as agents, some immersed in the playground, and some not. Both the players and the roles that guide the learning process from backstage (psychologists, teachers, or trainers) can be conceived as agents. Players become OSA, and contour figures become agents with specific functions, from supervision or score recording to observation, tutoring, advising or mentoring. Considering the different levels described in Section 2, we can say that, at the shell level, an EG is a mise-en-scene of a plot by one or more agents interacting in a well and formally defined setting. On the core level, actions performed by OSA directly modify the game state, whereas BSA supports OSA at the educational level. It is possible to identify a clear separation between the shell level and the core one: The visible dimension can be conceived through traditional narrative techniques, and the core level, expressed in terms of SPA, implies the formal definition of the various game components that we have introduced.

#### *The Educational Level in the SPA Approach*

In this section, we will focus on the educational level in the SPA approach. BSA are the main characters at the educational level: They may have the function to support the learners involved in EG, mainly as OSA. BSA does not intervene directly in the playground but provides what is required to enhance the learning process. These agents cover specific roles, functional to the achievement of educational goals. Educational or learning goals are inspired by a specific learning theory, such as the already cited work by Bloom or Kolb [42] who emphasized concrete experience, active experimentation, reflective observation, and abstract conceptualization. A meaningful learning process is characterized by the presence of feedback, as giving (and receiving) feedback is essential to understand how close learners are to the defined learning goals. Feedback, together with debriefing are regarded as the most important element for maximizing the learning process [43], as they guide learners through a reflective process about their learning [44], offer a space for giving personal meaning to the learning experience [45], and help to relate this learning experience to real-life contexts. In the SPA approach, feedback is provided by BSA, and in the case of digital games, it can come both from real or artificial tutors. In digital games, their role is essential to provide learners with short feedback cycles through which they can get continuous and immediate information regarding the effect of their actions on the game interactions. Conversely, in traditional educational approaches where teachers generally have to mark students' work using conventional means (i.e., manually), there is a significant delay until students can receive the appropriate information regarding some aspects of their task. Digital EG can help to reduce such delays almost to zero. Moreover, feedback is offered throughout the full game session. A very important moment for delivering feedback is at the end of the game, during the debriefing phase, when the learners receive feedback about the overall performance. It is also a process which gives the opportunity to analyze what dynamics occurred during the game, what went wrong, and was achieved, and share experiences with other people, making it possible to compare different perspectives from other players or from other people involved in the learning process, such as tutors.

The SPA approach to developing EG opens a way to adopt software based on artificial intelligence systems to model the interactions between OSA and BSA and allows to conceive these complex interactions between agents as finalized to a meaningful learning process through feedback and debriefing activities. The tutor can be a human being, but also a virtual entity, thanks to artificial intelligence. In both cases, it is crucial that the tutor observes and traces behaviors, actions, reactions of learners during the game, similarly to what happens when a student performs a task or takes a test in face to face situations to create a learner's profile. To create such a profile, educational games can rely on a wide amount of data available, even more than in real life-oriented tasks. Digital games offer a system able to record every single action performed by the players, the time required by each action, as well as not effective choices made. Thanks to all this information, both real and virtual tutors can operate various analyses, to understand the cognitive state of the learner, thus implementing learning analytics. It is reasonable to hypothesize that real and virtual tutors can be even more effective when they jointly operate, as the virtual tutor can record a significant amount of data and provide immediate feedback, which is impossible to achieve from a human tutor, and the human tutor can supervise and actively guide the learning process in such a way that is, at the moment, very difficult, if not impossible, to achieve by a virtual agent.

The tutoring agents, both human or artificial, carry out various roles in different moments and at a different level. At the beginning they can select and decide which roles of the game will be played by each actor, also according to learning objectives and to previous results achieved. During the game, the OSA interacts with the narrative at the shell level and the game space level, while a BSA can help to maintain a high interaction level. At the end of the game, the tutor can build an individualized report regarding the overall interactions that occurred, record achievements and failures together with a preferred way to act, react and interact to build a detailed user profile. This report can also be useful to further customize the game/player interaction.

In the next section, we will introduce some examples of EG and present the design process that led to them by means of diagrams with formal notation.

#### **4. SPA Applications to EG**

In this section, we will report three relevant examples of how the SPA approach can be applied to EG design with particular attention to the formal representation of game elements. To this end we will use the following notation: The playground is represented as an empty rectangle, the circle represents OSA, and the square represents BSA. If the boundary is a full line, the agent is real, and if it is represented by a dashed line, the agent is artificial. The interactions are represented as lines: A continuous line represents a direct interaction, whereas the dashed line represents an indirect one, arcs represent feedback.

#### *4.1. Block Magic*

Block Magic [46,47] is an educational platform that exploits augmented reality based on RFID/NFC technology that allows building custom educational games with both physical and digital components. It consists of a set of magic blocks, a magic board/tablet device, and specific software (Figure 4). Magic blocks are an augmented version of traditional logic blocks, widespread structured materials, classically used in education. The technologies employed to augment are RFID/NFC sensors that allow to unite the manipulative approach, stimulated by logic blocks, and touch-screen technologies. An RFID system consists of an antenna and a transceiver, which can read the radio frequency and transfer the information to a device, and a small and low-cost tag, which is an integrated circuit containing the RF circuitry and information to be transmitted.

**Figure 4.** The Block Magic kit.

This configuration permits to a PC or a table, with BM software installed on, to connect with BM Magic Table, another relevant BM material. The Magic Table has a hidden antenna that recognizes each block, sends a signal to the PC/tablets, and produces feedback coherently with pupils learning path.

Each augmented magic block had an integrated/attached passive RFID sensor for wireless identification of every single block. A specially designed wireless RFID reader device, an active board, is used, which can read the RFID of a block and transmit the result to the BM software engine.

On the software side, the BM augmented blocks together with the Magic Table are complemented with software that includes a series of already-developed exercises and an authoring tool to build new ones.

The BM software engine is mainly formed by two parts: The first one is devoted to receiving input from the active board and generating an "action" (aural and visual). These actions implement the direct feedback the user can receive interacting with the system. This feedback is regulated by an embedded intelligent tutoring system [48,49] that ensures autonomous interaction between the user and the system, receiving active support, corrective indications, feedback, and positive reinforcement from the digital assistant on the outcome of the actions performed.

The second software component is devoted to customization, and it is dedicated to teachers, educators, etc., allowing them to build their exercises to be proposed to the child, focusing the attention on the skills the child needs to train more.

In BM, the narrative comprises the plot, the scenario, the characters, the setting and has the task of attracting the player and filling the game experience with meaning. The appropriate narrative allows to attract the child, so to immerse him/her in a completely different environment, that is relevant in every educational context. The narrative level exercises a framing effect on the core level.

The core is configured as an interaction between the player, the human OSA (or the players in a collective scenario), the teacher, another human OSA, and the artificial BSA. The interactions are mediated by physical materials: The Magic blocks.

#### Block Magic Representation in SPA Terms

From BM general description, we can move to BM description in terms of the SPA approach. As represented in Figure 5, in this case, we have two human OSA interacting: A learner and a teacher. Many important functions are played by the BSA, which is artificial. It provides feedback to the player (arc on the left) during the game, it affects the human OSA teacher proposing existing exercises and recording learner's interaction, it has an indirect effect on the learner OSA through the trainer OSA. The BSA is built according to adaptive tutoring systems theories [46,47].

**Figure 5.** Block Magic represented in the SPA notation (the playground is represented as an empty rectangle, circles represent on-stage agents (OSA), and the square represents the BSA. Full-lined boundary indicates a real agent, dashed line indicates an artificial agent. A continuous line represents a direct interaction, a dashed line represents an indirect one, arcs represent feedback).

To validate the SPA approach, some data were collected with BM users. During the BM project (www.blockmagic.eu), trials were run in four schools in European Countries and involved about 250 students and 10 teachers of primary school and kindergarten. The teachers used pre-defined exercises but could also build their ones using the BM authoring tool, which is based on the SPA approach. After this process, researchers administered to teachers a structured questionnaire with 10 questions on a five-point Likert scale (5 indicating the most positive attitude) about the design process with SPA.

Results indicate that the design was facilitated by the SPA: In particular, it was appreciated for the possibility to quickly define interactions in the exercise (average = 4.30, st. dev. = 0,48; the point 5 in the scale corresponds to the most positive evaluation), to define the functions, especially the educational one, in the game in terms of agents (average = 4.10, st. dev. = 0,74) and to share in an easy and manageable way the idea with other professionals involved in the process (average = 4.40, st. dev. = 0.52). More details are reported in [46].

#### *4.2. Enact: An EG to teach negotiation*

Another example of SPA used to develop EGs is represented by Enact, implemented on a platform, based on recent psychological modeling through the application of current ICT research such as e-learning, mobility, internet, artificial intelligence [50,51].

The platform facilitates "learning by doing" experiences as the training scenarios that can be developed through EG can simulate real-life situations, and due to their verisimilitude, can enable the transfer of what has been learned to similar real-life contexts [52,53], developing the specific negotiation competence.

It is a single-player game designed to train users on effective communication and negotiation skills. A training scenario is populated by two 3D avatars, one controlled by the user and the other by the computer (the BOT), both able to express a range of communication aspects and elements by using verbal cues (e.g., vocal tone, shape of the speech bubble, and structure of the sentence), and non-verbal indicators (e.g., body posture, facial expression, eye contact, and gestures). These patterns of behavioral indicators have been identified in the communication model of assertiveness, passivity, and aggression [54].

On-stage agents within Enact are both the learner and the artificial agent with which the user interacts with during the game (see Section 4.2.1). OSAs perform their roles and interact with each other according to the theoretical principles of the five styles of handling interpersonal conflict proposed by Rahim and Bonoma [55] and Rahim [56] the psychological model adopted and underpinning the Enact game.

In other words, the main principles of the two theoretical psychological models of negotiation by Rahim and communication by Dryden and Constantinou underpinning the game, represent the rules defined in the core level that determine the OSAs' psychological and physical features. Rahim model differentiated five different styles of handling conflict on two basic dimensions: Concern for self and concern for others. The first dimension explains the degree (high or low) to which a person attempts to satisfy his or her own concern, while the second explains the degree (high or low) to which a person attempts to satisfy the concern of others. The combination of the two dimensions results in five styles of handling interpersonal conflict: Integrating, obliging, compromising, avoiding, dominating.

The five styles of handling interpersonal conflicts are described, as follows:

Avoiding (low concern for self and others) has been associated with withdrawal, buck-passing, or sidestepping situations.

Obliging (low concern for self and high concern for others) is associated with attempting to play down the differences and emphasize commonalities to satisfy the concern of the other party.

Dominating (high concern for self and low concern for others) has been identified with a win-lose orientation or with forcing behavior to win one's position.

Compromising (intermediate in concern for self and others) involves give-and-take where both parties give up something to make a mutually acceptable decision.

Integrating (high concern for self and others) involves openness, exchange of information and examination of differences to reach an effective solution acceptable to both parties.

Moreover the conflicting scenarios have been designed according to a series of variables which combination resulted in 25 different conflicting scenarios animated by 24 different characters, such as type of conflict (if based on divergence or convergence), gender (if player and agent have the same or opposite gender, so that the interactions can result as male-male (or female-female) and male-female (or female-male), and ethnic variables (to allow a user-avatar interaction covering different ethnic groups).

The user is introduced to the game with a scene explaining the conflicting situation, the role assigned to the user and her goal within the given scenario (shell level). Each exchange between the user and BOT is organized in a five-state scene (one for each of Rahim's styles of handling conflicts), which includes one turn of speech for each party. Each exchange is related to a gesture and/or facial expression that shows the way the sentence will be communicated to the BOT (core level).

After the user's answer, the BOT computes it according to the embedded psychological models, that is, for example, a dominating BOT will show predominantly aggressive and authoritative behaviors. Conversely, an obliging BOT will show an overall passive and submissive attitude towards the negotiation (Figure 6).

**Figure 6.** OSAs interacting at the shell level at the beginning of Enact. Introduction to the different OSAs.

The user starts the game by pressing the "play" button that brings the player on the game scene: The user's avatar is presented in a small window at the left upper corner of the screen, while the BOT represents the main character focused on by the camera.

The user's five possible choices are shown below the small avatar window, while the responses of the BOT are shown in the text bubble appearing over its head.

When the mouse is over one of the five user sentences (on the left-hand side of the screen), the animation (non-verbal behavior) related to that sentence is shown in the top-left window.

The innovative aspect of the Enact game is represented by its assessment feature that complements the training aspect. It implements soft skills measurements with an innovative rigorous psychometric approach, that offers the users the opportunity to assess her/his handling conflict styles, along with her negotiation and communication skills.

The assessment within Enact corresponds to the core of what we have defined as evaluation/tutoring level and represents the playful way through which the user can be assessed in a standardized manner according to the abovementioned Rahim's model.

The assessment of the player is based on the preferred negotiation styles used during a series of negotiation scenarios, given the description of the five styles provided by Rahim, and "pen and pencil" ROCI II instrument developed by the author. ROCI II is designed to measure the five independent dimensions of the styles of handling interpersonal conflict (integrating, obliging, dominating, avoiding, and compromising). The instrument contains three Forms A, B, and C to measure how a person handles her (his) conflict with her (his) supervisor, subordinates, and peers, respectively.

The Enact assessment is also fundamental for the automatic elaboration of a training strategy tailored to the specific development areas of the player, to create an effective learner-centered environment, where the user activity is focused on the areas of behavior that mostly require improvements.

Enact profiles resulted from user's game experiences are correlated with those obtained by the users through the administration of the ROCI ROCI-II (Rahim organizational conflict inventory-II). For this reason, the Enact tool has been designed to return a score directly comparable with the ROCI-II to produce scores for each of the five styles of handling conflict contemplated in Rahim's model: Collaborating, accommodating, dominating, avoiding, and compromising. In addition to the ROCI-II

form C, the other four psychological tests have been administered: (a) A short version of BIG five personality inventory, (b) assertive efficacy test, (c) self-efficacy test, and (d) coping test. The aim was also to investigate possible relationships between high scores of self-efficacy and relevant personality traits with the styles adopted by the Enact users and related positive effects on negotiation processes observed within the game sessions.

All the test takers had to play Enact and fill in the electronic form containing the five psychological tools in a row, in random order so to avoid bias related to the order of presentation. The users were asked to negotiate with an avatar in 10 different scenarios. The assessment took about one hour.

The system collects the data about the user's behavior and choices and creates a model of the player that will then be used for generating tailored information to be used in the training session. The score and profile of the player's negotiation skills are actually calculated by summing the independent concern for self and concern for other variables gathered during interactions, which are represented within every sentence that the user can choose.

In the assessment session, the artificial agent's behavior is static, not adaptive, and reflects a specific negotiation style for each of the scenarios.

The tutoring system is available only after the assessment has been completed. Thus, it will intervene during the training scenarios and at the end of the game session in order to provide useful information to the user about his or her performance related to the BOT he or she is currently interacting with and to his or her general behavior when managing conflicting situations.

The user is given a profile based on the Rahim model related to the specific situations he or she played, together with advice about how to improve the efficacy of his or her communication and the changes achieved since the assessment profiling.

The profile emerges mainly through a comparison of the behavior of the user and the style of the artificial agent she interacted with.

Furthermore, we have highlighted the importance of offering the user with immediate feedback about his/her performances. An example of immediate feedback is provided in the Enact game session by the on-stage agent (Figure 7).

**Figure 7.** Examples of verbal and nonverbal indicators expressed by OSAs during the conflicting interaction.

The BOT, which the user interacts with, displays immediateness of the interaction with an aggressive, assertive, obligingness facial expressions and body posture (non-verbal communication) and gives verbal feedback through the text (Figure 7).

#### 4.2.1. Enact Representation in SPA Terms

In the Enact game, as shown in Figure 8, there are two on-stage agents: The learner who plays the game and the artificial agent with which the user interacts in the scenario. The human OSA performs his/her role according to his/her negotiation style, whereas the artificial OSA acts according to the implementation of Rahim's principles. The OSAs interact directly with the other with questions and answers. In this case, there is a BSA, a tutor who is artificial and interacts directly with the artificial OSA and indirectly with the human OSA. It is outside the playground but affects directly the artificial OSA and indirectly the human OSA.

**Figure 8.** Enact game represented in the SPA approach (the playground is represented as an empty rectangle, circle represent OSA and square represent BSA. Full-lined boundary indicates a real agent, dashed line indicates an artificial agent. A continuous line represents a direct interaction, a dashed line represents an indirect one, arcs represent feedback).

At the end of the game, it also provides human OSA with relevant feedback.

During the Enact project, also the effectiveness of the SPA approach was investigated. Indeed, the Enact game was pre-validated in two iterations: The first one allowed to collect feedback by the means of a questionnaire on the quality of the interface and the BOT. The questionnaire was composed of eight questions on a five-point Likert scale. The complete results are reported in [2–53].

Data showed that the overall feedback was extremely positive. The second iteration involved the participants playing with different scenarios, and then a questionnaire of 13 questions on a Likert scale was administered. Additionally, in this case, the feedback provided by users was positive.

On the qualitative side, the people involved in the design and implementation of the Enact game, using the SPA was useful and allowed them to efficiently collaborate with other professionals involved in the game development.

#### *4.3. Eutopia*

#### 4.3.1. Eutopia: EG to Train Soft Skills Based on Role-playing Mechanisms

Eutopia represents a specific application of SPA to develop EGs, as it is not just a game but rather a platform with which it is possible to create an unlimited number of role-playing games.

Eutopia platform can acknowledge many years of experience underpinning several European projects, such as Sisine, Sinapsi, Eutopia-Mt, Proactive, and S-cube project. Eutopia has been used and tested in different contexts and by different group targets (university, training institutions and agencies, MEs and SMEs, public administration, as well as non-governmental organizations and social enterprises) and for the development of various kinds of competencies (negotiation, international mediation, negotiation, communication, leadership, team building, time management, motivation, decision making, and problem solving).

Eutopia takes inspiration from the technology used in multiplayer games and embeds role-play methodology as a psycho-pedagogical approach.

The underpinning learning approach is based on open dynamics so that there is not an exclusive way to achieve the desired learning objectives.

The technological dimension allows a virtual extension of traditional face-to-face role-playing activity that is transposed it into a digital setting. This enhances the potential of the training experience in which learners are involved. Eutopia recreates a graphical word populated by virtual actors (avatar) controlled by real users.

While role-playing methodology that derives from psychodrama and sociodrama [33] has learning purposes, role-playing videogames are created for recreational purposes and take inspiration from pen-and-paper role-playing games. Indeed, role-play [30] has extensively been recognized as a powerful technique for enhancing the traditional training practice, boosting participants' learning experience, facilitating knowledge, and promoting skills, competencies, and group, as well as personal development, in face to face activities [57–60].

Since its origins, role-play technique has been variously adapted and applied to different settings and contexts, for different purposes and to many disciplines (e.g., psychology, organizational change, sociology, and pedagogy) for intensifying and accelerating learning and for developing new ways of understanding of concepts and knowledge.

Role-plays can be adopted to deal with personal (psychodrama) or collective (sociodrama) issues and used to exercise a variety of specific skills (learning simulations).

Moreover, role-play games can be considered as learning strategies that can be enhanced through technology by extending learning through added dimensions that may be impossible to conduct in face-to-face situations [61]. Among them, the so-called massive multiplayer online role-playing games (MMORPGs) and multi-user virtual environments (MUVEs) as, for example, Second Life (http://secondlife.com/education/) and Active Worlds (http://www.activeworlds.com/edu/).

MMORPGs derive from role-playing video games, which in turn take their origins from pen-and-paper role-playing games (e.g., Dungeons and Dragons) and use much of the same terminology, settings, and game mechanics.

Regarding the technological dimension, Eutopia, in addition to the functions normally provided in MMORPGs and MUVEs, offers specific features designed to facilitate its use in distance learning. In particular, it has been used to develop a variety of role-playing games for the development of different soft skills.

In summary, the platform is based on a client/server architecture, which comprises three different software pieces for users:


Trainers through and within Eutopia assume potentially different roles. They can act as a playwright by writing storyboards, as a screenwriter by personalizing training scenarios, as a casting director by assigning roles to be played out, as a movie director by monitoring and guiding participants' actions and behaviors, a as director of photography by selecting relevant dynamics to be recorded, a as film critic by giving actors personalized feedback (debriefing phase).

Trainers by creating storyboards can define properties of training scenarios along with psychological and physical features of the different roles to be played by participants (Figure 9). They also act as a guide for using the learning platform features at their best to explore the learning potential of available tools.

**Figure 9.** Script definition.

The use of feedback and debriefing systems allows the exploitation of all the potential of trainers' guide, facilitation, and support.

The Eutopia virtual environment provides an avatar-based system of communication, mediated by the artificial agents representing both human being trainer and the learners, respectively BSA and OSA.

By using the Eutopia Editor, trainers can write the storyboard for online multiplayer games. Its design requires an accurate definition of learning goals, narrative, and roles to be enacted and of the physical and psychological features of avatars (Figure 10).

Learners act out their roles interacting in a virtual, navigable environment provided by the system, through controlling virtual alter egos, the avatars.

These represent what we have defines as OSAs, as they directly act and interact in the virtual environment by influencing the dynamics of the game and impacting on its process.

Learners can communicate via short text messages, which appear in bubble cartoons over their avatars' heads.

They can also interact by using various forms of para- and non-verbal communication (expressed by emoticons and facial expressions that can be assumed by avatars).

For example, players can decide the loudness (shown by the font size of the text in the bubble) and emotional tone (shown by the shape and color of the bubble) of a message.

Players can control the gestures and body movements of avatars, for example, by making the avatar wave goodbye, point at someone, or hug someone.

They can "whisper" messages to each other, that is, send messages are that are visible only to players directly involved in the conversation and to the trainer.

Finally, they can communicate with the trainer and raise any questions to receive guidance or clarification. Trainers after scripting and starting the role-playing session can intervene during interaction among learners in two possible ways.

The first is to act as an invisible stage director that is to behave as a back-stage agent by using a variety of features to observe interactions among players. The second is by directly intervening in the game. For example, she can take the role of a character in the scenario and play the game like other players. However, they can also activate events to change the dynamics of actual interactions. These represent cases in which the role of BSAs coincides with that of OSAs.

When the game is concluded, they can provide players with personalized feedback assessing whether the group and individual goals have been achieved and to what extent, encourage group discussion and examine the most significant aspects and dynamics emerged, as well as the main strategies adopted by players.

Indeed, an embedded tutoring tool enables to record training sessions, and replay role-play session interactions for tutors to provide feedback to significant interactions between participants to encourage the communication process, mutual sharing, self-reflection, and self-discovery and help in identifying potential areas of personal development. Feedback can be provided immediately after role play or in a later feedback session

**Figure 10.** Avatar control as a way to explore an online session.

4.3.2. Eutopia Representation in SPA Terms

In the Eutopia platform, as shown in Figure 11, there are many on-stage agents that interact virtually: They are human OSA. They perform their role following the defined script and following the trainer (BSA indications). In this case, all the agents interact both directly and indirectly.

**Figure 11.** Eutopia platform represented with SPA notation (the playground is represented as an empty rectangle, circles represent OSA and squares represent BSA. Full-lined boundary indicates a real agent, dashed line represents an artificial agent. A continuous line represents a direct interaction, a dashed line represents an indirect one, arcs represent feedback).

At the end of the interaction, the BSA offers to OSA feedback and reflections on the different interactions.

Eutopia has been used and tested in different contexts and by different group targets (university, training institutions and agencies, MEs and SMEs, public administration, as well as non-governmental organizations) and for the development of various kinds of soft skills within different research projects, such as Sisine, Sinapsi, Eutopia-Mt, Proactive, S-cube (more information at www.nac.unina.it). In particular, to study the attitude towards the SPA approach, the perception of 18 experienced professionals (educators, trainers, psychologists, and educationalists adopting role-playing activities in traditional settings) on the use of role-playing games in educational and training contexts, with a specific focus on the Eutopia platform was investigated.

They completed a questionnaire on their perception of how online role play can encourage and foster meaningful learning experiences among participants. More details are reported in [2].

With regard to the methodological effectiveness of online role-play via SPA agents, we can affirm that it is generally considered as effective. A large consensus amongst the professionals was found on the role of the trainer, both virtual and real as conceived in the SPA approach.

#### **5. Discussion and Conclusions**

In this paper, we have introduced the SPA approach to developing EG. This approach presents various advantages. It opens the way to adopting automatic control systems and software based on artificial intelligence systems to model OSA and BSA behavior, as shown in the application section. By these means, it is possible to delegate both on-stage and backstage functionalities to intelligent and autonomous artificial agents, making it possible to run EG with mixed teams composed of human and artificial agents. It is, in fact, easier to build artificial agents to support the educational enterprise rather than model separately educational functions and features. Moreover, SPA allows to reproduce, model, and feed the dialogic interaction offering a formal representation of the people involved in the learning/teaching dynamic.

SPA offers an effective methodology to build up games moving on the shell and core level as well as the educational one. This means, that the same core level can be combined with different shell levels so as to be adapted to different contexts and allow to compare various populations (i.e., children and adolescents) and various areas of application (i.e., education, training or assessment).

Last but not least, it proposes a comprehensive framework that can be easily understood by specialists with different expertise. In EG design and development, education specialists, teachers, trainers are involved as well as computer scientists, software engineers, etc. These specialists can share their knowledge through this framework in a very effective way.

However, a possible shortcoming of this approach comes from the consideration that there are games, as well as educational software, for which there might be no need to define the rules of interaction in terms of psychological agents.

It is possible to summarize that the strongest point of the approach used within SPA to develop EGs is related mainly to the educational aspect allowing users to foster transversal skills through innovative approaches to teaching, learning, and assessment. The EGs proposed are based on two different educational approaches reflected in the implementation of the SPAs. Form a technological perspective is possible to distinguish EGs more cantered on allowing a virtual extension of traditional face-to-face psychodramatic mechanism and experiences (e.g., Eutopia), and those that instead reproduce "artificial" worlds based on computer-simulated, formal models about specific phenomena or theories to investigate (Enact and BM). From an educational and user-centered perspective, it is possible to identify two main categories. One category can represent the extent to which, while playing the game, the user has to express herself through behavioral acts that involve her body or other forms of interactions, such as an actor would do on stage. Those elements correspond to the traditional behavioral domain that plays a prominent role in psychodrama as we have highlighted for Eutopia and Enact, though with a different grade of involvement and immersion. Situations like BM in which the user is asked to perform abstract and strategic forms of decision-making are different from and yet complementary to these kinds of games. Here, the user's logical and reasoning aspects are prominently highlighted. The educational approach underpinning Eutopia is based on open dynamics. Therefore, there is no unique way to achieve the desired learning objectives. The technological dimension enhances the potential of the training experience because it makes a virtual extension of traditional face-to-face role-playing activity possible, transposing it into a digital setting. What emerges is that the figure of the trainer simultaneously represents a source of strength and weakness. On the one hand, it is undeniable that a real BSA trainer can enrich game performance by providing facilitation and adaptable performance feedback. On the other hand, the study presented shows that the need for fully skilled trainers may increase the cost and time of training.

Moreover, the dynamics resulting from the gameplay depend on learners, rather than on any form of artificial intelligence. This means that participants are offered a far richer, more open, learning experience than would have been possible if they had to interact with artificial OSAs and BSAs. However, the disadvantages of this method are represented by high cost and time consumption in organizing and managing the complexity of the virtual learning scenarios, as well as interactions among participants. Indeed, the critical element that emerged is related to the trainers/teachers' role in managing the online role-plays, and their need to be skilled in mastering different competencies at once.

Those limits have induced the authors to consider the advantages of introducing game technologies less dependent on the supervision of real BSAs, such as Enact.

In this case, although the system allows users to dramatize and enact role-plays, the complexity of the dynamics between OSAs is limited by the rule of the game to a certain number of actions, and the responsibility of the BSA is certainly reduced. Therefore, the assessment and observation of learning experience is less subjected to the influence and interpretation of many other potential interfering variables.

While Eutopia and Enact allow users to experience direct involvement with the learning objectives through a personal dramatization by acting out roles, BM points instead more on the logical and reasoning aspects involved in the gameplay. In this case, a set of formal rules and interactions embedded in the game needs to be followed for learners to achieve the relevant learning objectives.

This brings us to another aspect of our experience that is the appropriateness of the use of EGs. The decision on which game to use depend largely on the skills to be developed, as well as the resources and the time allocated for achieving the learning objectives. For instance, if a learning objective regards training from the cognitive domain, and the priority is making players learn and assess specific skills or behaviors (e.g., problem-solving requiring a quick response), the ideal methodology is more likely to be based on more structured games, as BM. Indeed, the educational resources and learning path that learners have to follow is easily accessible from learners at any time from anywhere. However, the set of formal rules and interactions to be followed to achieve the relevant learning objectives are embedded in the software and do not require a constant presence of experienced real external guidance as BSA. BM and Enact can drive the player to a stable training outcome more rapidly than in open dynamic situations, like Eutopia. Therefore, the advantage of this method lies in the fact that it is very low cost, as after an initial phase to familiarize users with the system, and it can be used without the guidance of a trainer, as the system is self-regulated and enables learners to achieve objectives rapidly. Conversely, if the competencies that are meant to develop are more related to aspects of emotional awareness, self-assessment, and self-confidence, we think that a situation methodologically such Eutopia, closer to the traditional role-play technique, might be the most appropriated. For all the EGs presented, we can acknowledge that the strength of providing the software with authoring systems has been a valued an extremely beneficial aspect as it allows trainers to rapidly develop their scenarios, personalizing their work for specific target populations with specific learning needs. In this light, there are many possible and potential areas of application.

The strongest points of the approach used within EGs are related mainly to the central role assigned to the player in the training or assessment processes developed within the software. The users can enhance their attitudes towards different skills, improve their capabilities, understanding, and practice with the support of the tutoring system provided, and following customized training sessions.

The experiences of the EU projects confirmed the value of using information technology as a tool placed in the hands of a trainer for the development of controlled ad hoc learning exercises, rather than being considered a simple replacement for trainers and learners.

The SPA approach presents a novel element of flexibility, both in delivery and practice of different skills and competencies training, where users can broaden the practice of different skills outside the traditional classroom approach by leveraging Internet technologies. However, what is even more interesting, professionals can be in total control of the model implemented, the training and the assessment processes. Furthermore, every skill or competence that requires the exploitations of people's interactions could benefit from such realization of an SPA to develop EGs.

The reason is that, whatever skills need to be transferred in the digital role-play, the educational technological level represented within the software enables modification both of the narratives and the educational models underlining the training requirements.

**Author Contributions:** Conceptualization, M.P., E.D., D.M., O.M.; methodology, M.P., E.D., D.M., O.M.; writing—original draft preparation, M.P.; writing—review and editing, M.P., E.D., supervision, D.M.; project administration, O.M.

**Funding:** BM Game was developed within the LLP lifelong Learning programme; the Enact Game was developed within the framework of the ENACT Project, funded by EACEA under Erasmus+ KA3 measure; EUTOPIA has been developed within the framework of projects funded by Lifelong Learning Programme (EACEA).

**Acknowledgments:** The authors would like to thank Onofrio Gigliotta for the graphical abstract design and drawing.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Towards Agent Organizations Interoperability: A Model Driven Engineering Approach**

#### **Luciano R. Coutinho 1, Anarosa A. F. Brandão 2, Olivier Boissier <sup>3</sup> and Jaime S. Sichman 2,\***


Received: 4 May 2019; Accepted: 7 June 2019; Published: 13 June 2019

**Abstract:** In the research and development of *multiagent systems* (MAS), one of the central issues is how to conciliate the autonomy of the agents with a desirable and stable behavior of the MAS as a whole. *Agent organizations* have been proposed as a suitable metaphor for engineering social order in MAS. However, this emphasis has led to several proposals of *organizational models* for MAS design, thus creating an *organizational interoperability problem*: How to ensure that agents, possibly designed to work with different organizational models, could interact and collectively solve problems? In this paper, we have adopted techniques from *Model Driven Engineering* to handle this problem. In particular, we propose an abstract and integrated view of the main concepts that have been used to specify agent organizations, based on several organizational models present in the literature. We apply this integrated view to design MAORI, a model-based architecture for organizational interoperability. We present a MAORI application example that has shown that our approach is computationally feasible, enabling agents endowed with heterogeneous organizational models to cooperatively solve a problem.

**Keywords:** interoperability; multiagent systems; organizational models

#### **1. Introduction**

In the research and development of *multiagent systems* (MAS), one of the central issues is how to conciliate the autonomy of the agents with a desirable and stable behavior of the MAS as a whole. Borrowing ideas from the Social Sciences, some authors have named this issue the *problem of social order*: "How to obtain from local design and programming, and from local actions, interests, and views, some desirable and relatively predictable/stable emergent result" [1]. A closely related issue is the *problem of social consensus* often characterized as how to reach agreement with regard to some aspect or quantity of interest in a network of agents by combining the local preferences or states of individual agents [2,3]. Both social order and social consensus are fundamental problems in the design of MAS. While social order stresses the idea that agent behaviors must be coherent with the MAS global purpose, social consensus highlights the need of agreement among agents working together for a global purpose.

Faced with these problems, especially in the context of open MAS (i.e., systems formed by a dynamical population of agents provided by different developers), several researchers have argued in favour of using the *human organizations* as a proper metaphor for engineering MAS [4–7]. Human organizations, whose typical examples are firms, clubs, corporations, etc., are *collectivities* pursuing *specific goals* and exhibiting *formalized social structures* [8]. Goals are specific to the extent that they are explicitly and clearly defined. Social structures are formalized in such a way that patterns of

structuring and behaving (such as *roles*, *role relations*, *procedures*, *protocols*, *norms*, etc.) are precisely specified regardless of personal traits and relations of any individual part of the organization. Thus, by conceiving a MAS as an organization—or more generally as a bigger system formed by several organizations, hereafter called *agent organizations*—the basic idea is to promote social order and consensus in a top-down fashion. The idea is to have the agents' actions and interactions governed by formalized "social structures", defined above the agents level, in order to enable the MAS (seen as a collective entity) to achieve definite global goals.

This emphasis on organizations as a suitable metaphor for engineering social order has led to several proposals of *organizational models* for MAS design [4]. From the perspective of software development, most of these organizational models can be characterized as *domain specific modeling languages* [9]. That is, they provide a specialized *conceptual structure* (metamodel) embodied in some *concrete syntax* (notation) by means of which the designer can write formal representations of the social structure of agent organizations. Such representations, called *organizational specifications*, are then used as specification artifacts driving the development of agents and MAS.

On the one hand, the existence of many organizational models favours the organizational design of MAS since, with various proposals, experience and best practices are accumulated. On the other hand, a great variety of organizational models introduces heterogeneity in the development of MAS. As a direct consequence of this heterogeneity, mainly in the case of open MAS, a new and important interoperability issue arises: If to enter and fully work in an organization the agents should be designed to "understand" and comply with an organizational specification of a given kind (i.e., conforming to some organizational model), then, in addition to the communication language and the domain ontology, the organizational model is something that the agents are supposed to share in order to properly work together. In other words, how can we provide means for a set of agents, immersed in a common environment, to evolve, reason, decide and interact with each other based on organizational concepts, since their organizational models may differ? In this paper, we call this issue the *organizational interoperability problem*.

We can think about four basic approaches to solve the organizational interoperability problem: *Standardization*, *universal agents*, *delegation* and *adaptation*. Standardization consists in providing interoperability by eliminating the root of the problem, the diversity, by means of a *standard model* that has to be accepted and used by all developers [10]. The universal agents approach implies the creation of agents which are able to deal with several different organizational models [11]. Delegation means creating specialized services in middleware layers (like proxies [12] and governors [13]) to whom agents may delegate reasoning and decision mechanisms related to organizational issues. Adaptation, by its turn, is a solution based on the possibility of defining mappings between models [14]. From these mappings, an *adapter* is created, a component that converts specifications from a model to another model [15,16].

Each of these approaches have their pros and cons. Standardization fully eliminates the problem but it is politically and economically difficult to achieve and lets aside legacy systems. Universal agents must be updated every time a new model is created or changed. Delegation practically vanishes the agents organizational autonomy. Adaptation deals with legacy systems but it is technically difficult to achieve, if not impossible, when there are no meaningful mappings between the models.

Motivated by the organizational interoperability problem and the basic approaches to it, all of which presuppose an integrated knowledge of the organizational models used to engineer agent organizations, the objective of this paper is to analyze the conceptual structures of several organizational models present in the literature and, based on this analysis, to propose an abstract and integrated view of the main concepts that have been used to specify agent organizations. We believe that the abstract view of organizational models we put forward can be used both as basis for defining essential mappings and for future standardization efforts of organizational models.

This work is based on a previous work [17] in which we did a review of several prominent organizational models to answer the questions: How are the conceptual structures of the models related? Are there basic similarities? What are they? The answers we have given to these questions, in terms of *modeling dimensions*, are summarized in Section 2. The idea of modeling dimensions describes the organizational models basic similarities in broad terms. It characterizes the macrostructure of organizational models.

In this paper we deepen our analysis by further exploring the conceptual structure within each modeling dimension. In this sense, we seek to characterize the common concepts and their relations found in existing organizational models along the modeling dimensions. The specific questions we propose to answer in this paper are: Inside each modeling dimension, what are the recurring modeling concepts? Is it possible to combine these recurring concepts into a coherent whole? How this can be used in a solution to the organizational interoperability problem? In approaching these questions, we have used techniques from Model Driven Engineering (MDE) [18]. Specifically, we think of organizational models as domain specific modeling languages whose conceptual structures are represented by means of metamodels. Thus, to address the questions systematically, we propose an iterative integration method, described in Section 3, aiming at building an integrated metamodel out of particular metamodels.

A central step in the integration method is the identification of correspondences between the conceptual structure of organizational models represented by metamodels. To assist this identification, in Section 4, we analyze the recurring concepts of existing organizational models along the modeling dimensions. The result is an *abstract conceptual structure* formed by the union of *conceptual patterns* found by comparing the organizational models. Relying on this abstract conceptual structure and using the proposed integration method, in Section 5 we show how to effectively integrate (part of) three existing organizational models. To put into perspective the integration of organizational models, we then discuss in Section 6 a solution based on adaptation for the organizational interoperability problem. In this solution, named MAORI (*Model-based Architecture for ORganizational Interoperability*) [19], the mappings between organizational models are defined indirectly by using the integration we have proposed.

In Section 7, we compare our proposal to related work in the literature. To the best of our knowledge, the systematic integrated analysis of organizational models we propose is novel, constituting the main contribution of the paper. Its importance, as already hinted, lies on serving as a common ground for aligning organizational models and as a starting point towards standardization. The MDE approach we apply is also a contribution and advancement in the state of the art. Looking at the literature, we found that few organizational models are defined in terms of explicit metamodels. Then, our representation of existing organizational models and their integration by means of formal metamodels helps in further the understanding of their features and limitations. Finally, in Section 8, we present our conclusions and future work.

#### **2. Organizational Models for MAS**

This section presents organizational models for multiagent systems by classifying their content in modeling dimensions that were adopted to define a method for the integration of organizational models. As a result of using the method, we build an *abstract conceptual structure* to deal with the organizational interoperability problem within MAS.

#### *2.1. Modeling Dimensions*

After a detailed analysis of a significant part of the existing organizational models, we have noted a lot of similarities and complementary issues regarding their conceptual structures. The common points identified were classified into some recurring themes we have called *modeling dimensions* for agent organizations [17]. In what follows, we discuss the modeling dimensions identified. Then we move to a short overview of the various organizational models analyzed. Ending the section, we show a comparative table summarizing the models analyzed along the modeling dimensions identified.

#### 2.1.1. Fundamental Aspects of Systems

In general, designed systems exhibit some fundamental aspects that, from an engineering standpoint, are natural candidates for modeling. Firstly, there is the *functional behavior* of the system—the input (stimulus) to output (response) relations that couple the system to external elements composing the environment in which the system is situated. In modeling this aspect, the system is commonly depicted as a black box whose internal constitution, at first, does not matter. What really matters is that the environment imposes functional requirements, such as operations or tasks, that the system as a whole is supposed to perform. Further, these functional requirements may be subdivided in a recursive way until reaching atomic operations or tasks arranged in a given ordering (dependency graph). Later, when the innards are determined, the actual execution of the atomic operations or tasks can be associated with specific components of the system and their interactions. Modeling the functional behavior is a common practice both in developing computer systems and in representing organizational processes. Modeling techniques such as DFD (Data Flow Diagram) [20], and the activity diagrams of UML (Unified Modeling Language) [21] are typical examples of that.

Another fundamental aspect is the *internal structure* of the system. In contrast, to model the internal structure of a system means to represent it as a transparent box. It means to represent the break down of the system in its constituent parts (components and subsystems) and the relations interconnecting these parts. Like functional modeling, modeling the internal structure of a system is a recurring theme in system design. In software development, for example, the class diagrams and the component diagrams of UML serve to this purpose. In the case of human organizations, a traditional form of structural modeling is the creation of organograms describing the divisions, roles and hierarchical relationships existing inside an organization.

A third candidate for modeling is the *structural behavior* of the system. Roughly, it consists of the "movement" of the internal structure of a system towards the realization of some desired functional behavior. Thus, when modeling the structural behavior, we also see the system as a transparent box. What we try to represent is the ordering of interactions occurring over time among the constituent parts of a system. These interactions make the system work, i.e., perform some expected task or operation in its environment. Examples of this type of modeling are the sequence and collaboration diagrams of UML.

#### 2.1.2. Primary Modeling Dimensions

From the premise that designed systems in general, not only agent organizations, exhibit these three fundamental aspects as natural modeling concerns, we have used them as a first classification scheme for separating the modeling concepts of organizational models into cohesive categories. Consequently, we define:


#### 2.1.3. Social Systems and the Normative Dimension

While the functional, structural and interactive dimensions can be justified by analysing the modeling of systems in general, they are not sufficient to classify all modeling concepts appearing in organizational models.

According to [22], three basic types of systems and corresponding models can be identified: *Deterministic*, i.e., systems and models in which neither the parts nor the whole are purposeful; *animated*, i.e., systems and models in which the whole is purposeful but the parts are not; and *social*, i.e., systems

and models in which both the whole and the parts are purposeful. A fourth type is also considered, *ecological*, i.e., systems and models in which the parts are purposeful but the whole is not. Given this classification, we can say that traditional software systems are deterministic, autonomous agents are animated, and agent organizations are social. Bigger and more encompassing MAS aggregating agents and agent organizations form ecological systems.

Being deterministic, traditional software systems tend to have an architecture in which the functional behavior, the internal structure and the structural behavior are foreseen in detail. Their components are not conceived as purposeful entities with autonomous behaviors. On the contrary, they are designed to obey rigidly what is fixed in the architectural specification of the system.

Regarding agent organizations characterized as social systems, the idea of agents obeying rigidly the prescriptions of functional, structural and interactive specifications is not realistic. Agents are conceived as self interested components, especially in open MAS. Therefore, neither the functional, structural and interactive specifications can be very detailed to the point of precisely determining the minutiae of the joint structuring and behaving of the agents, nor one can assume benevolence from the agents with respect to the organizational goals.

In this context our analysis is that the specification of *norms* (permissions, prohibitions, obligations, etc.), as occurs in human organizations design, are also expected to show up in organizational models. They will work as a complementary mechanism helping to couple more flexibly the agents to the organization. On the one hand, norms provide explicit means to capture interdependencies among the functional, structural and interactive aspects (e.g., agent playing a given role in the internal structure is obliged to behave functionally or structurally in a given way). On the other hand, norms can be used to explicitly regulate sanctions or penalties to deviant behavior.

Accordingly we define a fourth and last category for the analysis of organizational modeling concepts:

• The *normative dimension*, characterized by modeling concepts to further restrict, regulate and interrelate elements from the other modeling dimensions, given the expected autonomous behavior of the agents.

#### *2.2. Models Review*

Now we pass to a quick description of concrete organizational models taking into account how they cover the four dimensions of modeling identified. We describe six models—TAEMS [23], AGR [5], STEAM [24], MOISE+ [25], ISLANDER [26] and OPERA [27]. We think of these as good exemplars showing how models have evolved towards a full coverage of the organizational modeling dimensions. Other models are mentioned at the end of the review.

#### 2.2.1. TAEMS

In TAEMS (Task Analysis, Environment Modeling, and Simulation) the basic modeling concept is the notion of *task*. In essence, by using TAEMS we can specify *tasks structures* composed by the definition of *tasks*, *resources*, *tasks relationships*, and *task groups*. A task group is an independent collection of interrelated tasks. There are two kinds of task relationships: *Subtask* and *non-local effects* relationships. The subtask relationship links a parent task to child task explicitly defining a task decomposition tree. Individual tasks that do not have child tasks are called *methods*. Methods are primitive tasks that agents should be able to perform. Non-local effects are task relationships that have positive or negative effects in the quality, costs or duration of the related tasks. Examples of possible non-local effects are: *Facilitates*, *enables*, *hinders*, *limits*, etc.

TAEMS is a model specialized exclusively in the specification of the functional behavior of agent organizations. A TEAMS specification, the task structure, only represents what should be done by the agents alone (method definitions) or in groups (task groups). It tells nothing about the internal structuring or explicit interactions to realize the specified tasks.

#### 2.2.2. ARG

AGR (Agent, Group, Role) is the evolution of the AALAADIN model [28]. In AGR *agent*, *group* and *role* are the primitive modeling concepts. An agent is an active, communicating entity playing one or more roles within one or more groups. No constraints are placed upon the architecture of an agent or about its mental capabilities. A group is a set of agents sharing some common characteristics. A group is the context for a pattern of activities and is used for partitioning organizations. An agent can participate at the same time in one or more groups. Agents may communicate if and only if they belong to the same group. A role is the abstract representation of a functional position of an agent in a group. Roles are local to groups, and a role must be requested by an agent.

AGR is a model providing a minimalist structural view of organizations. There is no concepts for modeling functional behavior. The specification of an organization, called *organizational structure*, is in essence the depiction of the internal structure of the organization in terms of roles, roles constraints and group structures. AGR also says that agents can have their joint behavior orchestrated by interaction protocols, but the nature and the primitives to describe such protocols are left open.

#### 2.2.3. STEAM

STEAM (a Shell for TEAMwork) is a model whose focus is teamwork. In STEAM an agent organization is conceived as an *agent team*. Two separate hierarchies are used to specify the internal structure and functional behavior of a team: A *subteam* and *roles* hierarchy (or *organization hierarchy*), and a hierarchy of joint activities (or *operator hierarchy*). The subteam and roles hierarchy is a tree in which the root represents a team, the internal nodes the possible subteams and the leaves the individual agent roles. The joint activity hierarchy is also a tree whose nodes are called *operators*. Leaf operators represent atomic activities. Internal operators represent a *reactive plan*, i.e., the decomposition of an activity into interrelated subactivities. For each individual role or subteam, it is assigned one or more operators from the activity hierarchy.

With STEAM we see a first model that combines the structural (subteams and role hierarchy) and functional modeling dimensions (operator hierarchy).

#### 2.2.4. MOISE+

MOISE+ (Model of Organization for multI-agent SystEms) is a model that explicitly divides the specification of an agent organization in three parts: The *structural*, the *functional* and the *deontic* specifications. The structural specification defines the internal structuring of agents through the notions of *roles*, *roles relations* and *group specifications*. A role defines a set of constraints the agent has to accept to enter in a group. Role relations are *links* (*communication*, *acquaintance* and *authority*) and *compatibilities* from a source role to a target role. A group specification consists in role definitions, subgroup definitions (group decomposition), links and compatibilities definitions, role cardinalities and subgroup cardinalities. The functional specification describes how an agent organization usually achieves its *global goals*, i.e., how these goals are decomposed (by *plans*) and distributed to the agents (by *missions*). Global goals, plans and missions are specified by means of a *social scheme*. A social scheme can be seen as a goal decomposition tree where the root is a global goal, the internal nodes are *plan operators* (sequence, choice, parallel) to decompose goals into subgoals, and the leaves are atomic goals that can be achieved by an individual agent. Missions are coherent sets of goals; hence, an agent that is committed to a mission is responsible for the satisfaction of all its component goals. Finally, the deontic specification associates roles to missions by means of *permissions* and *obligations*.

Like STEAM, MOISE+ addresses both the functional and structural dimensions of modeling. However, MOISE+ goes further and provides concepts for modeling normative aspects (deontic specification). The deontic concepts allow a flexible coupling between the functional and structural specifications that is not seen in STEAM.

#### 2.2.5. ISLANDER

ISLANDER is a declarative language for specifying *electronic institutions*. According to ([26] p. 348), "Institutions establish how interactions of a certain sort will and must be structured within an organization". In ISLANDER, an electronic institution is composed of four basic elements: A *dialogic framework*, *scenes* definitions, a *performative structure*, and *norms* definitions. In the dialogic framework it is defined the participating *roles* and their *relationships*. Each role defines a pattern of behavior within the institution and any agent within an institution is required to adopt some of them. A scene is a collection of agents playing different roles in interaction with each other in order to realize a given activity. Every scene follows a well-defined *communication protocol*. The performative structure establishes relationships among scenes. The idea is to specify a network of scenes that characterizes more complex activities. The norms component of an electronic institution defines the *commitments*, *obligations* and *rights* of participating roles.

With ISLANDER we perceive a change of focus from functional to structural behavior. Unlike the previous models, there is no concepts for explicitly modeling goals and plans (goal decompositions). All behavior is specified by means of direct interactions between roles (dialogs) and regulated by the definitions of norms.

#### 2.2.6. OPERA

In OperA (Organizations per Agents) an agent organization is specified in terms of four structures: The social, the interaction, the normative and the communicative structures. In the *social structure* are defined *roles*, *objectives*, *groups* and *role dependencies*. Roles identify activities and services necessary to achieve social objectives. Groups provide means to collectively refer to a set of roles. Role dependencies describe how the roles are related in terms of objective realization. The *interaction structure* defines how the main activity of an agent organization is supposed to happen. This definition is done in terms of *scenes*, *scene scripts*, *scene transitions* and *role evolution relations*. Scenes are representations of specific interactions. A scene script is described by its players (roles or groups), scene norms (expected behavior of actors in a scene) and a desired interaction pattern. Scene transitions are used to coordinate scenes by defining the ordering and synchronization of the scenes. Role evolution relations specify the constraints that hold for the role-enacting agents as they move from scene to scene respecting the defined transitions. The normative structure gathers all the norms that are defined during the specification of roles, groups, and scene scripts. Norms are specified as formal logical expressions. Finally, the communicative structure describes the set of performatives and the domain concepts used in the interaction structure by the role enacting agents.

OPERA is a model that addresses all the identified modeling dimensions. Nevertheless, we note that the functional and structural modeling of OPERA is less developed than the others models, the interactive modeling is comparable to what is found in ISLANDER, and the normative modeling is the most elaborated of all models analysed. In OPERA norms are expressed in a formalism called LCR (Logic for Contract Representation).

#### 2.2.7. Other Models

The literature on organizational models is vast. For reasons of space and scope, we briefly mention other models below:


In addition to these, we also find in the literature on agent oriented software engineering (AOSE) methodologies a strong concern about organization modeling during the analysis and design of MAS. The Gaia [37] and Tropos [38] are some examples of AOSE methodologies that incorporate the concept of organization in their metamodels.

#### *2.3. Models Comparison*

More than large differences, we perceive several similarities and complementarity among the conceptual structures of the organizational models analysed, as we show in Table 1. The commonalities occur in two levels. In a first macro level, they occur as the dimensions of organizational modeling that were identified. For example, all models except TAEMS present concepts to represent the internal structure of organizations (structural dimension). All models except AGR and ISLANDER promote the functional behavior modeling (functional dimension). ISLANDER and OPERA present very similar concepts for representing the structural behavior of organizations (dialogical dimension). Normative concepts appear in MOISE+, ISLANDER and OPERA (normative dimension).


**Table 1.** Organizational models comparison.

On a more detailed level of analyses, one can still identify various modeling patterns within each dimension. In the next section, we propose an iterative integration method that rely on these patterns, whose formal description is presented in Section 4.

#### **3. Method for the Integration of Organizational Models**

We advocate that modeling dimensions for agent organizations are useful not only to analyse and compare organizational models, but also they serve as a starting point for the *conceptual integration* of organizational models. As we have mentioned in the Introduction, when the objective is to provide organizational interoperability, a consistent conceptual integration of organizational models is a fundamental and necessary element. In order to systematically perform such integration (having in mind the problem of organizational interoperability), we have defined an iterative integration method that is discussed in this section.

#### *3.1. General Process*

Let *OM*1, *OM*2, ..., *OMn* be *n* organizational models to be integrated. In broad lines, we understand by a *conceptual integration* of *OM*1,..., *OMn* the process of representing, correlating and joining the *conceptual structure* (i.e., the modeling concepts and their interrelationships) of each *OMi* obtaining as the final result an *integrated metamodel MMint* whose conceptual structure subsumes the structure of all *OMi*. This idea of conceptual integration, as an iterative process, is depicted in Figure 1.

**Figure 1.** General conceptual integration method.

More specifically, for *n* models we have *n* − 1 iterations. In the first one, three sequencial steps are performed for *OM*<sup>1</sup> and *OM*2:


From the second iteration onward, the same three steps are performed for each *OM*3≤*i*≤*<sup>n</sup>* with the following differences: (1) Instead of *MM*<sup>1</sup> and *MM*<sup>2</sup> we have *MMi* and *MMint* #(*i*−2) , respectively; *MMi* is the metamodel representing *OMi* and *MMint* #(*i*−2) is the integrated metamodel from the previous iteration; (2) in the place of *art*(*MM*1, *MM*2) we have *art*(*MMi*, *MMint* #(*i*−2) ) which is the articulation between *MMi* and *MMint* #(*i*−2) ; and (3) instead of *MMint* #1 we produce *MMint* #(*i*−1) , i.e., the integrated metamodel resulting from joining *MMi* to *MMint* #(*i*−2) .

#### *3.2. Metamodel Representation*

In the first step, we assume a metamodel based representation of the organizational models. In this sense, our method adopts the way by which special purpose modeling languages are defined in the area of Model-Driven Engineering. Given this assumption, there are several metamodeling languages available for expressing the conceptual structure of the organizational models. Some of these languages are KM3 [39], MOF/OCL [40,41], XMF [42] and Ecore [43]; this last is used in this work, as illustrated in Sections 4 and 5.

#### *3.3. Metamodel Alignment*

In the second step, the definition of correspondences between modeling concepts is an inherently heuristic process. One possible heuristic is to use the modeling dimensions identified in Section 2. The basic idea is to divide the work along the modeling dimensions. For each model, we start by classifying its modeling constructs in one or more dimensions. Then, for each dimension covered by the models, we identify the corresponding modeling constructs. In this way, the functional modeling concepts of one model are put into correspondence with the functional concepts of the other model, the structural concepts of one model with the structural concepts of the other, and so on.

Another heuristic we put forward for aligning the conceptual structure of organizational models is to take into account some basic conceptual patterns found in the models. These patterns are described in Section 4 in the form of an abstract organizational model.

#### *3.4. Metamodel Merging*

Unlike the alignment, the merging of metamodels is a more deterministic process that can be fully automated by using several algorithms reported in the literature [44–46]. In general, these proposals for (meta)model merging can be described as merging based on *graphs* and *morphisms*. In this case, the metamodels are abstractly conceived as graphs and the correspondences between two metamodels assume the form of an articulation between graphs. If *MM*<sup>1</sup> and *MM*<sup>2</sup> are two metamodels viewed as graphs, then an articulation *art*(*MM*1, *MM*2) between them is a triple composed of a graph *Gart* and two morphisms *<sup>m</sup>*<sup>1</sup> : *<sup>G</sup>art* <sup>→</sup> *MM*<sup>1</sup> and *<sup>m</sup>*<sup>2</sup> : *<sup>G</sup>art* <sup>→</sup> *MM*2:

$$\operatorname{art}(\operatorname{MM}\_1, \operatorname{MM}\_2) = < \operatorname{G}^{\operatorname{art}}, \operatorname{m}\_{1"{\prime}} \operatorname{m}\_2 > 0$$

Intuitively, the idea is that *Gart* is a representation of the common concepts and relations found in *M*<sup>1</sup> and *M*2, and the morphisms *m*<sup>1</sup> and *m*<sup>2</sup> are the links mapping this common concepts and relations to their counterpart in both *MM*<sup>1</sup> and *MM*2.

Once characterized as graphs, the merging of two metamodels *MM*<sup>1</sup> and *MM*<sup>2</sup> is in essence an *amalgamated sum* (or *pushout*) of *MM*<sup>1</sup> and *MM*2, modulo *art*(*MM*1, *MM*2):

$$\operatorname{merge}(MM\_1, MM\_2) = MM\_1 \oplus\_{\operatorname{art}(MM\_1, MM\_2)} MM\_2 = \prec M M^{\operatorname{mt}}, m\_1', m\_2' > 0$$

where *MMint* is the resulting integrated metamodel at the end of the iteration and *m* <sup>1</sup> and *m* 2 are morphisms *m* <sup>1</sup> : *MM*<sup>1</sup> <sup>→</sup> *<sup>G</sup>int*, *<sup>m</sup>* <sup>2</sup> : *MM*<sup>2</sup> <sup>→</sup> *<sup>G</sup>int*. The integrated metamodel *MMint* consists of the union of the nodes (concepts) and edges (concept relationships) of *MM*<sup>1</sup> and *MM*2, where correspondent elements as described via *art*(*MM*1, *MM*2) are treated as only one element [44,46]. In this way *MMint* retains all non-duplicate information in *MM*<sup>1</sup> and *MM*<sup>2</sup> collapsing the elements that *art*(*MM*1, *MM*2) declares redundant. The morphisms *m* <sup>1</sup> : *MM*<sup>1</sup> <sup>→</sup> *<sup>G</sup>int* and *<sup>m</sup>* <sup>2</sup> : *MM*<sup>2</sup> <sup>→</sup> *<sup>G</sup>int*

describe how translating from the particular to the integrated metamodel. The inverse, translating from the integrated to the particular metamodels, is performed via *Gart*, *m*<sup>1</sup> and *m*2.

#### **4. Abstract Conceptual Structure of Organizational Models**

In this section we compare in detail the conceptual structure of the organization models discussed in Section 2. With this comparison, we intend to explicitly show two main findings: (*i*) We can identify patterns in the conceptual structure of organizational models inside each modeling dimension, if we homogenize the terminology used and abstract some particularities of each model; and (*ii*) the patterns identified can be consistently combined into a single conceptual structure (metamodel) that represents in an essential and integrated way the conceptual structures of organizational models. In the next subsections, we detail the basic patterns that emerge when one look more closely to the conceptual structures of the organizational models proposed in the literature. Each subsection focus on a modeling dimension previously discussed in Section 2. At the end, we combine the conceptual patterns obtaining in this way the abstract organizational metamodel.

#### *4.1. Functional Dimension*

In the functional dimension, we found concepts for the specification of the functional behavior of an agent organization, i.e., the collective behavior of agents when the internal structure of their organization is not taken into account. Looking at Table 1, we can see that this dimension occurs in TAEMS, STEAM, MOISE+ and OPERA. In these models, the functional specifications follow a general pattern which is illustrated in Figure 2.

#### 4.1.1. Graphs of Hierarchical Plans and Goal Relationships

In essence, the general pattern can be characterized as directed graphs where:

	- **–** Either the acyclic decomposition of a *goal* (*operator*, *objective* or *task*) into *subgoals* (*suboperators*, *subobjectives* or *subtasks*), giving rise to the notion of *hierarchical plans*,
	- **–** or binary relationships between *goals* (*operators*, *objectives* or *tasks*), like the *depends* relation in STEAM or the *non-local effects* in TAEMS.

Further, in each graph there is one *root node* that corresponds to a primary goal whose planning and future achievement is prescribed by the structure of the graph. Such graphs of hierarchical plans and goal relationships receive the names of "task group" in TAEMS, "operator hierarchy" in STEAM, "social scheme" in MOISE+ and "role objective definition" in OPERA.

In Figure 2, the conceptual pattern identified in the functional dimension is represented by means of an Ecore metamodel. The classes and references composing the metamodel are described in Table 2. Additional contextual constraints are presented in Table 3. These are written in OCL and formalize the static semantics of the metamodel (conceptual pattern).

**Figure 2.** Similar functional specifications written in TAEMS, OPERA, STEAM and MOISE+, which prescribe how an agent is supposed to proceed for buying a product in an electronic market (example taken from [47]). The class diagram (Ecore metamodel) in the center of the figure represents the conceptual pattern identifiable in the various approaches of functional modeling. The dotted arrows detail what concepts are captured by what classes of the metamodel.

**Table 2.** Functional specification pattern: Classes description.




#### 4.1.2. Particularities

Besides the common aspects represented in the functional specification pattern (Tables 2 and 3), there are some particular aspects in the organizational models that should be highlighted.

In TAEMS, there are the concepts of *resources* and *non-local effects between tasks and resources*. These concepts were not considered as part of the functional specification pattern presented because they occur only in TAEMS.

In MOISE+, there is no notion of *binary relationships between goals*. There is, however, the particular notions of *mission* and *preferences among missions*, which do not occur in the other organization models analysed, but only in MOISE+. Thus, like the concepts of *resources* and *resource non-local effects* of TAEMS, the notions of *mission* and *mission preferences* also do not appear in the functional specification pattern presented.

In OPERA, the functional modeling is done implicitly as part of a *role definition* (structural modeling). In this way, we have a functional modeling with little resources when compared to what can be found in TAEMS, STEAM and MOISE+. Quoting [27], a *role definition* is done specifying one or more *role objectives γ* and

"[each] role objective *γ* can be further described by specifying a set of subobjectives that must hold in order to achieve objective *γ*. Subobjectives give an indication of how an objective should be achieved, that is, describe the states that must be part of any plan that an agent enacting the role will specify to achieve that objective. However, subobjectives abstract from any temporal issues that must be present in a plan, and as such must not be equated with plans." (pp. 60–61)

From this passage, we conclude that there is no explicit notion of *hierarchical plans* in OPERA. Even so, the general notion of *goal decomposition* can be identified in OPERA, as we have done in the previous subsection, by making it to correspond to the notion of *subobjectives specification* in OPERA (see Figure 2).

Lastly, we note that the abstract concepts of *goal decomposition* and *goal relationships* admit concrete subtypes of various kinds in the organizational models analysed. For instance, in TAEMS, the kinds of goal decomposition are denominated *quality accumulation functions* (*qafs*) and the kinds of goal relationships are named *non-local effects* (*nles*): Examples of *qafs* are q\_seq\_last (all subtasks must be completed in order, and overall quality is the quality of last task), q\_sum\_all (all subtasks must be completed in no specific order, and overall quality is the aggregate quality of all subtasks) and q\_exactly\_one (only one subtask may be performed, and overall quality is the quality of the single task performed); regarding specific *nles*, two of them are *facilitates* (when information from one task reduces or changes the search space making some other task easier to solve) and *enables* (when information from one task is a prerequisite for doing another task). In STEAM, there are two basic concrete subtypes of goal decomposition: AND (when all suboperators must be done to realize a given operator), and OR (when at least one suboperator must be done to realize a given operator), which are complemented by the *depends* relationship (that establishes a partial order for doing operators). Finally, in MOISE+, there is not the concept of *goal relationship*, only three subtypes of goal decomposition: *Sequence* (when subgoals must be achieved in some order to achieve a given goal), *choice* (when only one subgoal must be achieve to fulfill a given goal), and *parallelism* (when all subgoals can be pursued at the same time in order to achieve a given goal).

#### *4.2. Structural Dimension*

Forming the structural dimension, we have modeling concepts used to specify the internal structure of an agent organization in which the agents must engage to become an active member of the organization. From Table 1, five organizational models provide concepts for creating structural specifications. They are: AGR, TAEMS, MOISE+, ISLANDER and OPERA. Looking carefully at the structural specifications one is able to produce using these models, like the ones shown in Figure 3, we realize that the structural dimension of organizational modeling can also be characterized by an abstract conceptual pattern.

#### 4.2.1. Graphs of Roles and Groups

In the structural dimension, the organizational models present three fundamental concepts: Role, group and role relationships. The instantiation of these three interrelated concepts forms the specification of the internal structure of agent organizations.

In essence, structural specifications can be characterized directed graphs where:

	- **–** either to the definition of *groups* (in AGR, MOISE+ and OPERA) or *teams* (in STEAMS),
	- **–** or to the definition of *roles* (in all models);
	- **–** either the *decomposition of a group in subgroups* (or a team in subteams), forming a *group (team) hierarchy*,
	- **–** or binary *relationships between roles*,

**–** or *links from a role to a group* (or *subteam*) in which the role can be played by agents.

**Figure 3.** Similar structural specifications written in STEAM, AGR, OPERA, MOISE+ and ISLANDER, for a team of agents part of a simulated soccer game (example taken from [24,25]). The team is composed of eleven *players* (one *goalkeeper*, three *backs*, five *mid-filders* and three *attackers*) and one *coach*, and is divided in three groups: *Defense*, *midfield* and *attack*. The class diagram (Ecore metamodel) in the center of the figure represents the conceptual pattern identifiable in the various approaches of structural modeling. The dotted arrows detail what concepts are captured by what classes of the metamodel.

With regard to the *group* (*team*) *hierarchy*, there are two situations present in the models. On the one hand, we have a unique *root group* that represents the organization as a whole (the hierarchy is a rooted tree), what occurs in STEAM, MOISE+ and OPERA. On the other hand, there is no explicit *root group*, or even the decomposition of groups in subgroups, what is the case of AGR. Regarding the *binary relationships between roles*, in general they are directed and, in the models AGR, MOISE+ and ISLANDER they are subdivided in various kinds with diverse interpretations.

Such graphs of roles and groups receive the names of "organization hierarchy" in STEAM, "organizational structure" in AGR, "structural specification" in MOISE+. In ISLANDER, they are part of the definition of a "dialogic framework". In OPERA, they form the "social structure".

In Figure 3, the conceptual pattern identified in the structural dimension is represented by means of an Ecore metamodel. This metamodel is presented in more detail in Table 4.

**Table 4.** Structural specification pattern.

```
Class Description
SSpec Represents the concept of structural specification, i.e., organizational specifications restricted
            to the structural dimension.
GroupDef Part of a structural specification that represents a group definition. Each group definition is
            characterized by the declaration of the roles that agents can play in the group (the RoleDef
            referenced by GroupDef::rootDef), and by possible subgroup definitions (referenced by
            GroupDef::subGroup). Abstracts the concepts of group structure in AGR, subteam role in
            STEAM, group specification in MOISE+, and group definition in OPERA.
RoleDef Part of a structural specification that represents a role definition. Abstracts the concepts of
            role in AGR, MOISE+, ISLANDER and OPERA, and individual role in STAEMS.
RoleRel Part of a structural specification that represents a direct relationship between two roles:
            a source role referenced by RoleRel::source, and a target role referenced by RoleRel::target.
            Abstracts the concept of role constraints in AGR, role relations and inheritance in MOISE+,
            static separation of duties (ssd) and subroles in ISLANDER.
            Auxiliary definitions
GroupDef ::getAllSubGroup() : Set(GroupDef), a query that returns the set of all group
            definitions that, direct or indirectly, are subgroups of a given GroupDef via
            the GroupDef::subGroup reference. In OCL:
             context GroupDef
             def: getAllSubGroup( ) : Set(GroupDef)
                       = subGroup(goal) -> union(
                                  subGroup -> collect (sg | sg.getAllSubGroup()));
                   Contextual Constraints
             (1) All group definition shall reference at least one role definition or one subgroup definition, or both.
                   In OCL:
                    context GroupDef
                    inv: roleDef -> notEmpty() or subGroup -> notEmpty()
             (2) No group definition can be, direct or indirectly, a subgroup of itself, i.e., the group definitions shall
                   form an acyclic directed graph.
                   In OCL:
                    context GroupDef
                    inv: not self.getAllSubGroup() -> includes(self)
```
#### 4.2.2. Particularities

Besides the similarities that give rise to the structural specification pattern, the organizational models also differ in some particular points. Four of these points deserve mention.

The first one is related to the definition of subgroups. In STEAM and MOISE+, the definition of subgroups forms a real hierarchy, i.e., an non cyclic graph. On the other hand, in AGR and OPERA, there is no explicit subgroup relationships between group definitions. From structural specification pattern perspective, this fact can be expressed in the following way: In AGR and OPERA, for each group definition g (instance of GroupDef), the collection of its subgroups is empty, i.e., g.subGroup = {}. On the other hand, in STEAM and MOISE+, there exist group definitions g such that g.subGroup = {}.

The second point is the notion of *cardinality* that is found in AGR and MOISE+ but not in the other models. Cardinalities can be defined for roles or subgroups. In the case of roles, cardinalities indicate a maximum and a minimum number of agents allowed per role in the context of a group. Regarding cardinalities of groups, they determine how many subgroups of a given type can be created in the context of a group. In AGR, cardinalities are attributes of role and group. In MOISE+, they are attributes of the association between role and group, or group and subgroup. For this reason, and observing that they are not an explicitly feature of the majority of the models, we have chosen not to explicitly represent cardinalities in the structural specification pattern.

As third point, we observe that the abstract notion of *structural relationships between roles* (class RoleRel, Table 4) admits diverse concrete subtypes, analogously to what happens with the notion of *goal relationships* (class RoleRel, Table 2). In AGR, for instance, there exist two subtypes: *Correspondence* (which states that agents playing one role will automatically play another one) and *dependency* (which rules out the possibility of an agent to play one role if it is not playing another role). In MOISE+, three subtypes: *Links* (which declare the possible relationships of *communication*, *authority* and *acquaintance* between roles), *compatibility* (which determines that two roles can be played at the same time by the same agent) and *inheritance* (which states that one role, besides its own features, also has all the features, like links and compatibilities, of another role). In ISLANDER, two subtypes: The concept of *subroles* (which is similar to the concept of inheritance in MOISE+), and the concept of *static separation of duties* (which means the opposite of the concept of *compatibility* in MOISE+).

In OPERA, there is only one type of binary directed relationship between roles: The *dependency*. Nevertheless, differently from the other organizational models, the concept of *dependency between roles* in OPERA is not properly a structural but rather a functional relationship. In other words, in OPERA, the dependency relationship reflects directly the decomposition of a goal into subgoals, elements of the functional dimension. When one of the subgoals defined in the scope of a role is a goal of another role, then there exist the dependency relationship between the two roles in OPERA. Such idea is different from the structural dependency present in AGR, which indicates that the fact of playing a given role is a prerequisite for playing another role.

The last point that should be mentioned concerns the nature of the group definitions. In all analysed models, except MOISE+, agents playing any roles in the same group may, in principle, exchange messages. Further, in the absence of explicit constraints, such as incompatibilities or dependencies, the agents are free to play the roles they wish in a given group. In MOISE+, we have the opposite situation. If there is no explicitly stated communications links or compatibility relations between roles, the agents are not allowed to exchange messages or play more than one role in the same group. Moreover, in MOISE+, communication links or compatibility relations are not limited to a single group, but can be specified between roles defined in different groups leading to possible inter-group collaborations. In other models, such as AGR, this inter-group collaboration can also be achieved by means explicitly defined correspondence links between two roles in different groups. In this way, an agent playing one of the roles automatically plays the other role and can participate in more than one group at the same time.

#### *4.3. Dialogical Dimension*

The dialogical dimension is characterized by concepts to prescribe (or describe) the direct interaction by means of message exchanging that occurs between role playing agents in order to achieve organizational goals. Among the organizational models considered in this work, only ISLANDER and OPERA offer explicit concepts for dialogical modeling (see Table 1). In these two models, the dialogical specifications are written according to approximate conceptual structures, as can be seen in Figure 4.

**Figure 4.** Similar dialogical specifications written in OPERA and ISLANDER for an agent based conference management system (example taken from [27] chapter 3). The class diagram (Ecore metamodel) in the center represents the conceptual pattern identifiable in the two approaches of dialogical modeling. The dotted arrows detail what concepts are captured by what classes of the metamodel.

#### 4.3.1. Hypergraphs of Scenes

On a macro level, both in ISLANDER and in OPERA, the direct interactions by message exchanging are partitioned into *scenes*. Scenes are structured and coordinated by means of directed *hypergraphs* (directed graphs where some edges, called *hyperedges*, can connect any number of sources and target nodes)in which:


In a well formed hypergraph of scenes, there is:


In ISLANDER, the hypergraphs of scenes are named "performance structures", and in OPERA they are called "interaction structures".

On a micro level, the interactions within each scene are governed by one or more predefined *dialogue scripts*. These scripts correspond to the concepts of *scene protocol* in ISLANDER and *interaction pattern* in OPERA. Dialogue scripts are not detailed in Figure 4. The reason is that the intra-scene (micro level) dialogical specifications have distinct natures both in ISLANDER and in OPERA, as will be discussed in the sequel.

In Figure 4, the conceptual pattern identified in the dialogical specifications of ISLANDER and OPERA is captured by means of an Ecore metamodel. This metamodel is described in details in Tables 5 and 6.

#### 4.3.2. Particularities

In both ISLANDER and OPERA, the dialogical specification consists in a network of scenes in which all possible or desirable episodes of direct interaction within an organization are planned and orchestrated. As mentioned earlier, this common structure takes place at the macro level. This means that the joint activity characteristic of agent organization, under a broad point of view, is ruled by the presented hypergraphs of scenes.

The main difference between ISLANDER and OPERA occurs at the micro level. In other words, restricting the point of view to each particular scene, instead of the network of scenes, the models analyzed have different ways to specify how agents can or should interact.

On the one hand, in ISLANDER, there is the notion of *scene protocol*. In a scene protocol, one represents in detail a communication protocol in which are specified all the involved roles, and the sequencing of all possible message exchanges (*illocution schemes*), in On the other hand, in OPERA, there is the notion of interaction pattern. Unlike scene protocols, an interaction pattern does not determine in detail the exchange of messages in a given scene. Instead, it delimits a partial order between scene states (*landmarks*) towards achieving the objectives related to the scene. Any detailed communication protocol used in a scene should respect the established interaction pattern.

#### *4.4. Normative Dimension*

The normative modeling dimension is characterized by the general concept of *norm* (*permissions*, *obligations*, etc.). Norms occur in organizational specifications as a mechanism that interrelates and complements the functional, structural and dialogical specifications. Three organizational models we have analyzed present concepts to create normative specifications. They are: MOISE+, ISLANDER and OPERA (see Table 1).

#### 4.4.1. The Concept of Norm

In the organizational models analyzed, unlike the other dimensions, the normative specifications are not created as graph like structures. Instead, they assume the form of *textual normative expressions*. This general pattern is expressed in Table 7.


**Table 6.** Dialogical specification pattern: Contextual constraints.

#### 4.4.2. Particularities

In ISLANDER, norms are written as logical expressions in accordance with the format:

(*s*1, *γ*1) ∧ ... ∧ (*sm*, *γm*) ∧ *e*<sup>1</sup> ∧ ... ∧ *en* ∧ ¬((*sm*+1, *γm*+1) ∧ ... ∧ (*sm*+*n*, *γm*+*n*)) → *obl*<sup>1</sup> ∧ ... ∧ *oblp* where (*s*1, *γ*1), ..., (*sm*+*n*, *γm*+*n*) are pairs of *scenes* and *illocution schemes*, *e*1, ..., *en* are *boolean expressions* over illocution scheme variables, ¬ is a *defeasible negation*, and *obl*1, ..., *oblp* are *obligations*. "The meaning of these rules is that if the illocutions (*s*1, *γ*1), ..., (*sm*+*n*, *γm*+*n*) have been uttered, the expressions *e*1, ..., *en* are satisfied and the illocutions (*sm*+1, *γm*+1), ..., (*sm*+*n*, *γm*+*n*) have *not* been uttered, the obligations *obl*1, ..., *oblp* hold" ([48] p. 38).

In OPERA, norms are specified using logical expressions written in a formalism called LCR (*Logic for Contract Representation*). There are three types of norms: *Obligations*, *permissions* and *prohibitions*. The following excerpt from ([27] p. 149) summarizes the syntax for writing these modalities.

```
<Norm>::= OBLIGED(<id>,<Norm-Form>)|PERMITTED(<id>,<Norm-Form>)|
```
FORBIDDEN(<id>,<Norm-Form>)

OBLIGED(<id >,<Norm-Form>) represents an obligation of the agent playing the role referenced by <id> in achieving the state <Norm-Form> described as an LCR formula ([27] chapter 4). Based on the notion of *obligation*, the concepts of *permission* and *prohibition* are defined. A permission PERMITTED<id>,<Norm-Form>) is an abbreviation for ¬OBLIGED(<id>,¬<Norm-Form>). In turn, a prohibition FORBIDDEN(<id>,<Norm-Form>) means the same as OBLIGED(<id>, ¬<Norm-Form>).


Finally, in MOISE+, the general concept of *norm* is translated into the notion of *deontic relations* that link *roles* to *missions*. There are two types of deontic relations, *permissions* and *obligations*:

"A permission *per*(*ρ*, *m*, *tc*) states that an agent within the role *ρ* may be committed to the mission *m*. Temporal constraints (*tc*) are established for the permission, that is, they determine a set of time periods when the permission is valid ... An obligation *obl*(*ρ*, *m*, *tc*) states that an agent within the role *ρ* is required to commit to the mission *m* in the time periods determined by *tc*." ([49] pp. 46–47)

In this case, the normative expressions *per*(*ρ*, *m*, *tc*) and *obl*(*ρ*, *m*, *tc*) are less comprehensive than what is found in OPERA and ISLANDER.

#### *4.5. Abstract Organizational Metamodel*

All the patterns identified in the organizational modeling dimensions, and previously discussed in Sections 4.1–4.4, can be combined to form an *abstract organizational metamodel*. This abstract metamodel, as shown in Figure 5, characterizes the common conceptual structure of the organizational models analyzed.

**Figure 5.** Abstract organizational metamodel.

By means of this abstract organization metamodel, we can see that the normative dimension works as a glue among the three others. It interrelates and/or regulates the organization behavior (be it functional or dialogical) and the organizational internal structure (in the sense of allowing or forcing the association of certain functional and/or dialogical elements with certain structural elements). Last, but not least, it makes it clear that structuring of organizational modeling dimensions greatly helps in making the notions independent and self-contained, while linked via normative bonds.

#### **5. Integration Method Application**

In this section we present an application of the integration method described in Section 3, guided by conceptual patterns identified in Section 4. We show how to apply the method to integrate the AGR, STEAM and MOISE+ models. Since a complete description of the integration process involve many details, we will restrict the discussion to the structural dimension.

As shown in Figure 1, we need to perform two iterations of the method to integrate three models. First we merge AGR and STEAM. Then, we merge MOISE+ to the result of the previous iteration.

#### *5.1. First Iteration*

The representation of AGR and STEAM as Ecore/OCL metamodels is shown in Figure 6. Below, on the left side, we have the AGR metamodel; on the right side, the STEAM metamodel (both restricted to the structural dimension). Above, mediating the alignment of the metamodels, we see the conceptual pattern of Section 4.2.

**Figure 6.** Alignment between AGR and STEAM.

In the alignment, the *organizational structure* of AGR and the *organization hierarchy* of STEAM are identified as similar specifications. The concepts of *group* (AGR) and *subteam* (STEAM) are declared similar concepts, both identified with the general concept of *group definition*. The same happens with the concepts of *role* (AGR) and *role* (STEAM), both identified with the general concept of *role definition*.

Intuitively, when we take into account only the terms used, these basic correspondences between AGR and STEAM are reasonable. However, when we look more closely to the specific relationships among the concepts, it is possible to see that there is a stronger coupling between *subteam* and *role* in STEAM than the one that exists between the correspondent concepts of *group* and *role* in AGR. In STEAM, the notion of *role* is abstract, being materialized both in the specification of activities for groups of agents as a whole and in the specification of activities for individual agents (*individual role*). This notion is represented in the metamodel as an abstract class STEAM::Role with two concrete subtypes STEAM::SubTeam and STEAM::IndividualRole. As a consequence, every instance of STEAM::SubTeam besides corresponding to a *group definition* is also a kind of *role definition*.

This coupling between the concepts of *group* and *role definitions* is absent in AGR and is not foreseen in the structural pattern of Section 4.2. To easy the merging of these different views regarding the nature of the concepts, one possibility is to interpret the generalization relation between STEAM::SubTeam and STEAM::Role as a composition relationship, similar to the application of the "replace inheritance with delegation" refactoring, as proposed in [50]. When this is done we posit a derived reference from STEAM::SubTeam to STEAM::Role named supRoleDef. This derived reference when used in the place of the original generalization decouples the concepts of *group definition* and *role definition* while permitting to represent the same information in a slightly different way.

In Figure 6 there are two other derived references: roleDef and subGroup; both extracted from the original reference role between STEAM::SubTeam and STEAM::Role. The rationale for these is to make explicit that, in reality, not only *role definitions* but also *group definitions* can be associated with a *subteam* via the reference role. Once these derived references become explicit, we can do a more fine grained matching between the STEAM metamodel and the corresponding abstract concepts of *group* and *role definition*.

Ending the comparison between the metamodels, we note that in AGR the *group definitions* cannot be decomposed into *sub groups* and there is the concept of *role relation* materialized as *role constraints* (*dependencies* and *correspondences*). In STEAM, there is no explicit *role relations* and no explicit *role cardinality*. In the case of *individual role* there is an implicit cardinality of *min=1* and *max=1*.

Finally, concluding the iteration, we merge the metamodels taking into account the correspondences identified. The resultant integrated metamodel *MMint* #1 is shown in Figure 7, where we retain the terminology of the abstract structural pattern of Section 4.2. See online version for colors. The elements added to the structural pattern from the specific metamodels are depicted in blue. The elements marked in red in the STEAM metamodel (Figure 6) are not included in the integrated metamodel, being replaced by derived references aforementioned. Essentially, *MMint* #1 is an *almagamated sum* of the AGR and STEAM metamodels (viewed as graphs) modulo the alignment (articulation) between AGR and STEAM, as describe in Section 3.4. For simplicity, the morphisms from *MMint* #1 to the AGR and STEAM metamodels are omitted.

#### *5.2. Second Iteration*

In the second iteration, we integrate the MOISE+ metamodel (structural dimension) to the resulting metamodel *MMint* #1 obtained in the previous iteration. The MOISE+ metamodel is shown on the left side of Figure 8. On the right side, we have *MMint* #1 (from Figure 7) augmented with derived classes and relationships. On the middle, there is the articulation graph between MOISE+ and *MMint* #1 metamodels. Since the articulation graph preserves the class and reference names from *MMint* #1 , for simplicity we have omitted the morphism *<sup>m</sup>*<sup>2</sup> : *<sup>G</sup>art* <sup>→</sup> *MMint* #1 .

In *MMint* #1 there are three classes GroupDef, RoleDef and RoleRel which represent the main concepts for the structural specification of agent organizations. In MOISE+, the correspondent classes are GroupSpecification, Role and RoleRelation, respectively. Similar to class *MMint* #1 ::GroupDef, class MOISE+::GroupSpecification represents the definition of a *group* in which it is possible to specify *roles* and *subgroups*. Like *MMint* #1 ::RoleDef, the class MOISE+::Role denotes a *role definition* associated with *group definitions*. Both *MMint* #1 ::RoleRel and MOISE+::RoleRelation characterize *role relationships* from a target to a source *role definition*.

Apart from this basic agreement, there are some particularities regarding how these concepts occur in MOISE+ that leads to an extension of *MMint* #1 . One first particularity is the way in which *group definitions* are linked to *role definitions* and *subgroups*. In the integration of AGR and STEAM, *group definitions* are linked to *role definitions* and to *subgroups* by means of the roleDef and subGroup references, respectively. In the MOISE+ metamodel the correspondent links are not represented by references but by the classes GroupRole and SubGroup, respectively.

**Figure 7.** Integrated metamodel *MMint* #1 .

By means of MOISE+::GroupRole and MOISE+::SubGroup, the same *role definition* or *subgroup* can have different cardinalities, one for each *group definition* in which the definition is referenced. The cardinalities are represented by the attributes max (for the maximum number of agents per role, or subgroups in a group) and min (for the minimum number).

In *MMint* #1 this flexibility is not possible as long as the cardinalities are declared directly as attributes of the *role definition*, and not as attributes of the relation between a *group definition* and *role* or *subgroup definition*. In this way, we note that the information about role and group cardinalities of MOISE+ can not always be expressed in the current integration of ARG and STEAM. However, the converse is always possible, as it can be shown by means of the derived classes *MMint* #1 ::RoleRef and *MMint* #1 ::GroupRef in the upper right of Figure 8.

The derived class *MMint* #1 ::RoleRef is the implict correspondent of MOISE+::GroupRole. Similar to class MOISE+::GroupRole, class *MMint* #1 ::RoleRef has attributes max and min and makes reference a single *role definition*. In the context of *MMint* #1 ::GroupDef the derivation of *MMint* #1 ::RoleRef is specified by the invariant GD2 shown in the bottom right of Figure 8. This invariant establishes that for each instance rd of *MMint* #1 ::RoleDef (referenced by roleDef), there must exist (be created) an instance rr of *MMint* #1 ::RoleRef that points to rd and has the attributes rr.min = rd.min and rr.max = rd.max.

By its turn, the class *MMint* #1 ::GroupRef is the derived correspondent of MOISE+::SubGroup. In the context of *MMint* #1 ::GroupDef the derivation of *MMint* #1 ::GroupRef is specified by the invariant GD3 shown in the bottom right of Figure 8. The invariant establishes that for each instance sg of *MMint* #1 ::GroupDef (referenced by subGroup), there must exist (be created) an instance gr of *MMint* #1 ::GroupRef pointing at sg and having the attributes gr.min = sg.supRoleDef.min and gr.max = sg.supRoleDef.max.

As long as they are more expressive, the classes RoleRef and GroupRef are used in the articulation graph replacing the references roleDef and subGroup present in *MMint* #1 . Therefore the replaced references are marked to be left out during the merge step at the end of the iteration. In addition, the attributes max and min in the class *MMint* #1 ::RoleDef are marked once the same information is now represented as attributes of RoleRef and GroupRef in the articulation graph.

 .

A second peculiarity of MOISE+ concerns the possibility of defining *links* and *compatibilities* between *roles* as part of a *group specification*. In this regard, there are two observation to be made. Firstly, *links* and *compatibilities* are new kinds of *role relation*, not present in *MMint* #1 . In fact, we observe that the *link* and *compatibility* concepts, in essence, differ from the *dependency* and *correspondence* concepts found in AGR. On the one hand, in MOISE+, a *link* enable the *acquaintance*, *communication*, or *authority* between roles; and a *compatibility* indicates that an agent playing a role can also play another role. On the other hand, in AGR, a *dependency* indicates that an agent only can play a role if it previously commits itself to another one; and a *correspondence* means that to play a role automatically implies to play another role.

The second observation is about the nature of the *group definition* concept behind the notions of *links* and *compatibilities*. By default, a *group definition* in MOISE+ does not enable the *compatibility* or any *link* between roles. In other words, if not stated explicitly, an agent playing a role do not have permission to play another role, or even to interact with agents playing another role, neither in the same nor in other groups. If *compatibilities* and *links* are needed, this must be explicitly specified in the *group definition*.

Conversely, in AGR and STEAM a *group definition* does not implies *a priori* any restriction regarding *compatibility* and *link* among the roles. With the exception of explicit *dependencies* relationships and cardinality restrictions, in AGR and STEAM specifications the agents are free to play the roles they want and are not blocked with respect to interacting with any agent playing some other role.

In the articulation presented in Figure 8, these observations are made explicit by means of the derived attributes allowsComm, allowsAcqu and allowsComp in the context *group definitions*. For MOISE+::GroupSpecification, these attributes have the value false. This represents respectively the *communication*, *acquaintance* and *compatibility* restrictions existing in MOISE+. On the other side, for *MMint* #1 ::GroupDef the three attributes assume the value true indicating the absence of the respective restrictions in AGR and STEAM.

Despite their opposite nature, we note that AGR and STEAM *group definitions* can be expressed in MOISE+. To this end, one has to explicitly define *communication*, *acquaintance*, and *compatibilities* relationships between all roles in a *group definition*. However, the converse is not always possible without losing information.

Ending the comparison, in MOISE+ there is a third form of *role relationship*: the *inheritance*. In MOISE+ metamodel this relationship is represented by the reference superRole involving instances of the class MOISE+::Role. In the articulation between MOISE+ and *MMint* #1 , an alternative form of representing inheritance could be as a subclass of RoleRel. As the inheritance relation does not have a direct effect on the behavior of the agents as the other role relations, being only a way of simplifying role definitions in MOISE+, we have opted to preserve the representation of this relation as the reference superRole rather than defining a new subclass for RoleRel.

Finally, concluding the iteration, we merge the metamodels taking into account the identified correspondences. The result is show in Figure 9.

**Figure 9.** Integrated metamodel *MMint* #2 .

#### **6. Organizational Interoperability Approach**

Adopting an organization-centered perspective [4], the engineering of MAS can be described as a process that starts with the creation of an *organizational specification* written in conformance to an *organizational model*. This specification is a prescription of the desired patterns of joint activity that should occur inside the MAS towards some desired purpose. Once the organizational specification is done, it is used as the input to an *organizational infrastructure*. In general, what we mean by organizational infrastructure is some kind of middleware supposed to interpret the specification and reify the *organization* of the MAS outside the agents. In this respect, it should maintain an internal *organizational state* and offer to the agents an interface for accessing and modifying this state. The information maintained in the organizational state contains a list of the members of the organization, what roles they are playing, what groups are active in the organization, among others. Finally, with the organizational infrastructure materializing the desired agent organization, it is time to develop application domain agents (not necessarily by the same designer of the organization specification) that can enter and interact inside it by accessing the available organizational interface.

Regarding *organizational infrastructures* there are several approaches for the engineering of (open) agent organizations [12,13,36,47,51–54]. On the one hand, the availability of a wide range of diverse models and infrastructures has made the development of agent organizations feasible. On the other hand, such a diversity introduced an important new interoperability challenge for agent designers: How to deal with heterogeneous organizational models and infrastructures? Whenever an agent is build to enter some MAS it has to be able to interact with the other participants using a particular agent communication language as well as to understand received messages against a given domain ontology. Besides this, if the MAS was designed as an agent organization, the entering agent has also to be able to access a particular organizational infrastructure and to interpret its underlying organizational model. In this way, the agent design can become tailored to a particular organizational approach.

For instance, suppose that several e-business applications designed as open agent organizations are available on the Internet. In addition, assume that these applications are heterogeneous regarding the organizational technology applied to build them. To put it in more concrete terms, let us suppose two agents organizations: One built upon the S-MOISE+ [53] organizational middleware, based on the MOISE+ model, and the other by using MADKIT [54] platform, based on ARG model. In this setup (and assuming a shared common agent communication language and domain ontology), the agent designers face the following problem: The native S-MOISE+ agents do not interoperate with the MADKIT platform, and vice-versa. Thus, it is not directly possible, for instance, to write an agent code that enter both e-business agent organizations in the search of products and/or services on behalf of its users. Such fact limits the range of applicability of S-MOISE+ and MADKIT agents which, in turn, limits the idea of open MAS.

As mentioned in the Introduction, four basic approaches can be envisioned for this organizational interoperability problem. One of them is to bridge the interface between the external agents and the agent organization by means of model mapping (Adaptation). By using such mappings it is possible to provide adapted copies of the specification and state of a given organizational model/infrastructure "understood" by the external agents.

In what follows we describe MAORI—a *Model-based Architecture for ORganizational Interoperability* [19]. MAORI is an experimental framework for providing organizational interoperability following the line of adaptation. Its main objective is to show how the integration of organizational models presented in this paper can possibly be used in a solution for the problem of organizational interoperability.

#### *6.1. MAORI Overview*

MAORI is structured along three layers, as it may be seen in Figure 10:


#### *6.2. Model Integration Layer*

M2M layer is composed of *metamodels* and *transformations*. For each organizational model *OMi*, there is a corresponding metamodel *MMi*. The metamodel *MMint* is the conceptual integration of all *MMi*, as described in Section 5.

The transformations are functions that implement the morphisms between the integrated metamodel *MMint* and the particular metamodels *MMi* (Section 3.4). There are two types of transformations. One type is **transf**(*f rom* : *MMi*) : *MMint*, which converts from *MMi* to *MMint*. The other type is **transf**(*f rom* : *MMint*) : *MMi*, which converts from *MMint* to *MMi*. In this way, M2M main functionality is to provide transformations that can be combined to translate specifications and states between organizational models/infrastructures.

#### *6.3. Organizational Interoperability Layer*

ORI layer works as an extension of organizational infrastructures. In order to enable heterogeneous agents in the same organization, ORI adds two basic components to the organizational infrastructures: *providers* and *adapters*.

Providers are responsible for exporting the organizational specification/state of agent organizations. In this case, to export means to use **transf**(*f rom* : *MMi*) : *MMint* to convert the specification/state from from a source *MMi* to the integrated metamodel *MMint*. Adapters are responsible for importing the organizational specification/state that was exported by a provider. The import is done by using **transf**(*f rom* : *MMint*) : *MMi*.

Imagine a scenario where an agent functions on a given organizational infrastructure and consider an agent organization implemented on a different organizational infrastructure. If the agent wants to participate in the organization, an adapter has to be instantiated in the organizational infrastructure of the entering agent. Initially, the responsibility of the adapter is to locate the appropriate provider, establish a connection with it, ask for the organizational specification/state and finally translate this specification/state to the target organizational infrastructure of the entering agent. In this way, for each heterogeneous agent organization there will be an organizational provider. Connected to this provider, there will be several organizational adapters, one for each organizational infrastructure in which there could be external heterogeneous agents.

**Figure 10.** MAORI framework (redesigned from [19]).

#### *6.4. MAORI Implementation*

MAORI was implemented in the Java programming language. The metamodels in M2M layer were coded using the Eclipse Modeling Framework (EMF) [55]. Regarding the transformations, there were first prototyped in the Atlas Transformation Language (ATL) [56] and then ported to Java for performance reasons. An implementation was developed as a proof of concept of the ORI layer considering the MADKIT and S-MOISE+ organizational infrastructures.

To evaluate MAORI, some agent organizations were developed. One is the example of a group of agents that wants to write a paper and use for this purpose an explicit organization to help them to collaborate. The organization consists in a group composed of: One agent in the role of *coordinator* (who controls the process and writes the introduction and conclusion of the paper), one to five agents in the role of *collaborator* (who writes the paper sections) and one agent in the role of *librarian* (who compiles the bibliography). Taking this simple example, some experiments were performed. One of them considered an organization composed of five agents—one coordinator (Eric), three collaborators (Greg, Joel and Mark) and one librarian (Carol). Initially the organization was started in the MADKIT platform. In addition, in MADKIT, the agents Eric and Carol were started and, after that, in the S-MOISE+ platform an organizational adapter was started to import the organization state. The three remaining agents (Greg, Joel and Carol) are started in S-MOISE+. They perceived and enter the organization by requesting the role of collaborator. At this point, the interaction begins: The agents in S-MOISE+ are now members of an organization running in MADKIT. More details about MAORI may be found in [19].

#### **7. Related Work**

The proposition of abstract structures, such as patterns, and integrated metamodel to enable interoperability among organization-centered multiagent systems is somehow new. Therefore, related work must be considered in several areas from business to services, including multiagent systems. In the following we contextualize our work within this broad scenario.

In the multiagent systems area, Pechoucek and Marik [57] adopted a model-driven approach to propose a general metamodel for developing multiagent systems. The metamodel is defined as a Platform Independent Model (PIM) considering the MDA abstraction levels definition [18]. In order to identify a unified metamodel to support the development of agent-based systems, they considered seven views: Multiagent view; Agent view; Behavioural view; Organization view; Role view; Interaction view and Environment view. They adopted the top down approach to develop the metamodels that represent each of the views aforementioned after analysing some existing agent-oriented modeling languages, methodologies and programming languages. Moreover, it was conceived to be used independently of the agent-oriented methodology, modeling and programming languages. Nevertheless, their main purpose was to support the development of MAS using a model-driven approach than providing means for interoperability among MAS. Our work presents some similarities with theirs since the proposed integrated metamodel could be used independently of the Organization Model adopted to design and implement an organization-centered MAS. In addition, the abstract structures where defined based on organization-centered multiagent systems dimensions, in a similar approach of theirs when stating their metamodel concerning some views. Nevertheless, although using a model driven approach, we adopted it to define the way integration occurs using the bottom-up approach to define the integrated metamodel, based on existing Agent Organization Models and their underlying metamodels. By doing that we foster interoperability to organization-centered MAS during design time, by providing means of transforming the design of a MAS with an underlying organization model into another one, or during execution time, as presented in Section 6.

Muramatsu and colleagues [58] provided organizational interoperability by using organizational artifacts within the environment where the MAS is situated. They adopted a normative language to describe the organizational structure in artifacts. In this sense, their work is similar to ours while adopting a common language (in our case a common metamodel) to describe several organizational models.

Isern and colleagues [59] classified organizational structures according to organizational paradigms, such as (i) hierarchy, (ii) holarchy, (iii) coalition, (iv) team, (v) congregation, (vi) society, (vii) federation and (viii) market, to support the design of MAS using existing agent-oriented methodologies and organizational models. The main purpose of their work is to provide information for MAS developers that would like to adopt an organization approach to develop MAS and did not know what Organization Model or agent-oriented methodology to choose. Their work is related to ours in the sense they adopted metamodels' characteristics of existing organizational models and existing agent-oriented methodologies to classify such organizational structures as patterns.

Karaenke et al. [60] proposed an inter-organizational interoperability architecture based on multiagent systems, web services and semantic web technologies. In their work, the MAS did not present an underlying organizational model and agents adopt the "head body" paradigm to include web services technologies to provide interoperability among enterprise information systems. Therefore, interoperability is focused on system-to-system communication using web services technologies, which limits the kind of systems that may participate in such communication. Our proposition is

broader in the sense that it provides a solution for interoperability among open organization-centered MAS independently of their underlying organizational Model.

A template description for agent-oriented patterns was given by Oluyomi and colleagues [61]. Based on a classification scheme, they organized agent technology concepts into categories and then identified agent-oriented pattern description templates for each category. Eight agent-oriented pattern templates were described to support the modeling of multiagent systems. Examples of the conformity between the proposed templates and their adoption during the design phase of existing agent-oriented methodologies were provided. Comparing with our proposition, their work adopts a similar approach when considering categories (in our case we adopted dimensions) to guide the patterns definition. In addition, their work is situated in the model level instead of the metamodel level as ours, since their objective is to improve communication among AOSE developers. Nevertheless, the adoption of an integrated metamodel obtained via model driven transformations combined with organizational dimensions give to our patterns high level of formality whenever compared with theirs.

Chella et al. [62] defined agent-oriented patterns to develop multiagent system to support robot programming. The proposed patterns were created based on an existing layered architecture for programming robots [63]. Patterns are described considering three aspects: The problem description; the definition of the solution in terms of MAS models and the description of the solution in terms of implementation. They just define some patterns for an specific domain based on the templates proposed by [61]. The only relation between their work and our is that pattern definition is based on some criteria that can be classified as dimension or category.

Organizational interoperability and integration issues are not new concerns for the administrative practice and research, specially after the wide acceptance and use of Information and Commmunication Technologies in their business models [64]. Several frameworks to provide organization interoperability were defined and even in this domain some dimensions were considered to define such frameworks.

#### **8. Conclusions and Future Work**

The research reported in this work has consisted in the use Model Driven Engineering techniques to address the *organizational interoperability problem*: How can we provide means for a set of agents, immersed in a common environment, to evolve, reason, decide and interact with each other based on organizational concepts, since their organizational models may differ? In order to achieve this goal, we have proposed an abstract and integrated view of the main concepts that have been used to specify agent organizations, based on the analysis of several organizational models present in the literature. In this model, we captured the recurring modeling concepts, that were coherently combined into an abstract conceptual structure. We have then presented an adaptation-based solution for the organizational interoperability problem, when we have defined the mappings between different organizational models, by using this abstract conceptual structure. We have built our abstract conceptual structure based on six organizational models (STEAM, MOISE+, AGR, OPERA, TEAMS, ISLANDER), presented in Section 2. For brevity, in Section 5 we illustrated the application of our integration method using three of these models (MOISE+, AGR, STEAM), and concerning exclusively the structural dimension.

A first extension of this work would be to build an integrated metamodel that could cope with OPERA and TEAMS. Moreover, we could evaluate how the other organizational models mentioned in Section 2.2.7 would affect our abstract conceptual structure. Concerning the MAORI framework, described in Section 6, we have tested its use by interoperating two organizational infrastructures, S-MOISE+ and MADKIT. A second extension of this work would be to test the franework with other organizational infrastructures, like AMELI [13] and ORA4MAS [52]. Finally, we would like to test our model driven approach to solve other MAS interoperability problems, like the ones mentioned in Section 1.

**Author Contributions:** Conceptualization, L.R.C, J.S.S. and O.B.; Formalization, implementation and experiments, L.R.C and A.A.F.B.; Supervision, J.S.S and O.B.; Writing and proofreading the paper, all authors.

**Funding:** "This research was partially funded by FAPEMA grant number 127/04 and CAPES, Brazil, Grant 1511/06-8". Jaime Sichman was partially supported by CNPq and FAPESP, Brazil. Anarosa Alves Franco Brandão was partially supported by grant #010/20620-5 and #014/03297-7, São Paulo Research Foundation (FAPESP), Brazil.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **ARPS: A Framework for Development, Simulation, Evaluation, and Deployment of Multi-Agent Systems**

#### **Thiago Coelho Prado and Michael Bauer \***

Department of Computer Science, Western University, London, ON N6A 3K7, Canada; tprado2@uwo.ca **\*** Correspondence: bauer@uwo.ca; Tel.: +1-519-661-3562

Received: 18 September 2019; Accepted: 18 October 2019; Published: 23 October 2019

**Abstract:** Multi-Agent Systems (MASs) are often used to optimize the use of the resources available in an environment. A flaw during the modelling phase or an unanticipated scenario during their execution, however, can make the agents behave not as planned. As a consequence, the resources can be poorly utilized and operate sub-optimized, but it can also bring the resources into an unexpected state. Such problems can be mitigated if there is a controlled environment to test the agents' behaviour before deployment. To this end, a simulated environment provides not only a way to test the agents' behaviour under different common scenarios but test them as well in adverse and rare state conditions. With this in mind, we have developed ARPS, an open-source framework that can be used to design computational agents, evaluate them in a simulated environment modelled after a real one, and then deploy and manage them seamlessly in the actual environment when the results of their evaluation are satisfactory.

**Keywords:** multi-agent systems; discrete event simulator; interoperability; agent and multiagent applications

#### **1. Introduction**

In environments where resource management is critical, software agents can be employed to optimize those resources. When the task to accomplish this is too complex to be carried out by a single agent, multiple interacting agents can be used to achieve effectiveness. The Multi-Agent System (MAS) approach has been successfully applied in many domains, including: helping controllers in Air Traffic Control for making decisions based on the aircrafts' fuel availability, flight plan, weather, and any other relevant data [1]; optimizing distributed generators in smart grids for energy production, storage, and distribution [2]; or finding better arrangements for the components in manufacturing plants to increase the throughput and optimize material usage [3].

Many frameworks and toolkits enable the implementation of a MAS. They are not designed to provide seamless means of evaluation of the outcome of the agents in the system under certain scenarios before deployment. A flaw during the design and implementation of the agents, or an unpredicted state of the environment where they are acting, can not only interfere with the achievement of their goals, but can also bring the environment into a unexpected situation with unforeseen consequences.

Some work with MASs has made use of simulations to aid deployed agents to update their plans according to the current state of the environment. This means that the simulation is used as a planner tool, meant to help the agent to make a decision under a specified near future scenario rather than using the simulation in the process of designing the agent behaviour before deployment. This strategy has been applied to the field of multiple Unmanned Aerial Vehicles (UAV) [4], where communication is required for coordination and failures related to it can make the entities unreachable. The system simulates possible scenarios where communication is unavailable and an action by the UAV is expected. Another example is the simulation component used to detect conflicts and inconsistencies of resource allocation during the high-level planning in a manufacturing plant [5].

There is no general purpose MAS framework, to the best of our knowledge, that integrates the process of validation of the agents in a simulated environment before their deployment. To address this, we have developed ARPS, an open-source framework available at https://gitlab.com/arps/arps/ under MIT LICENSE [6], to seamlessly design, implement, assess the agents, and deploy the MAS after the results meet established criteria. ARPS stands for some of the core properties of agents: autonomy, reactivity, proactivity, and social ability. We use the management of resources of a data centre to illustrate our approach and how it can be used in other domains.

In Section 2 we cover related work. Following this, Section 3 describes the background and architecture of the framework. The process of the implementation of the MAS to manage an experimental scenario is shown in Section 4. In Section 5 we discuss the findings, limitations, and future directions.

#### **2. Related Work**

There are multiple toolkits, platforms, and frameworks available for creating a MAS. In this section, we will describe a few of them. The works here by no means represent the only alternatives for creating MASs. For other options, refer to surveys on this topic, such as the one presented in [7].

The works reviewed here implement common aspects of a MAS to work in an actual environment, such as communication, interoperability, storage, security, and resource discovery. Therefore, the users can focus on the definition of the agents and their behaviour, how they are organized, and how they interact to solve a problem.

Among the popular MAS frameworks, JADE (also known as Java Agent Development Framework) [8] was created to address the problem of interoperability and provide an environment for the development of agents. It has no domain dependent requirement as seen in the other solutions. According to the authors, such dependencies were obstacles for the adoption of MAS technologies at the time of its conception. JADE simplifies the implementation of multi-agent systems through a middleware compliant with the FIPA (Foundation for Intelligent Physical Agents) specifications [9], a standard proposed for interoperability of agents. The JADE authors argue that is industry-driven and currently the most known FIPA-compliant agent platform in the academic and industrial community.

The A-Globe platform [10] is designed for testing experimental scenarios featuring agents' position that requires a Geographical Information System (GIS, though agents may suffer from communication inaccessibility either because of the spatial distance of the agents or by broken links. Because it is a closed-environment, interoperability is not one of the concerns of this platform. Hence, it is not fully compliant with the FIPA-specifications on inter-platform communication, albeit it provides compliance with the Agent Communication Language (ACL), a structure for composing messages exchanged by agents.

Based on the fulfillment of requirements such as robustness, security, and the ability to ensure that a partial solution can be executed when an optimized one is not found due to constraints, DARPA funded a project called Cougaar [11] (Cognitive Agent Architecture). This agent platform was created to offer specialized support for logistics-related problems. This platform is also not FIPA-compliant. It aims to facilitate the development of agent-based applications that are complex, large scale and distributed.

Other solutions offer more components built on top of the existing agent-based platforms. The Jadex BDI Agent System [12] follows the Belief Desire Intention (BDI) model [13] and facilitates intelligent agent construction over other middleware such as JADE. It has been used to build applications in different domains, such as simulation, scheduling, and mobile computing. The programming model of Jadex allows for designing an application as hierarchical decomposition of components interacting via services and thus helps to make complexity manageable. These components can be used in concurrent and dynamic distributed systems. Another example is JaCaMo [14]. It is an interpreter for an extended version of AgentSpeak, a BDI agent-oriented logic programming language [15]. It is a platform that integrates three projects with different MAS-related paradigm

models for development: Jason [16] for agent development; Moise [17] for agent organization; and CArtAgO [18] for environment-oriented programming.

Lastly, the MadKit [19] offers a modular and scalable multiagent platform. Its central aspect is the ability to organize agents in groups and roles aiming at the development of artificial societies. It is closely related to our approach in the sense that has a simulator component. Nonetheless, the simulation and evaluation are not seamless in the workflow, leaving the responsibility of this integration to the user.

Previous work has often presented MAS tools that provide a base infrastructure to implement agents. These approaches also looked to help address other specific problems, such as interoperability by defining standards, robustness and security. In some cases, the work sought to introduce approaches to enrich the design process, provide clear definition of how agents are organized and what their roles are. However, these solutions lack a component that would allow a developer to assess the behaviour of the agents in a simulated environment before actual deployment. Further, this step should be as seamless as possible, i.e., it should take no or few modifications to alter agents in the simulated environment in order to make them run in the actual environment. This is the gap that ARPS fills.

#### *Agent-Based Modelling (ABM)*

One of the main solutions available to simulate complex systems is known as Agent-based modelling (ABM). It is employed in domains such as social sciences, biology, ecology, engineering, and economics. There are many platforms to implement ABM [20,21]. Among them, we list a few widely adopted examples. The Swarm package [22] provides object-oriented libraries of reusable components for building models and analyzing, displaying, and controlling experiments on those models. NetLogo [23] is a software that provides packages to simulate multi-agent in environments. It has a significant user community, it is highly documented, and they provide many demos. MASON [24] is a discrete event multi-agent simulation toolkit. It was designed to serve as the basis for a wide range of multi-agent simulation tasks ranging from swarm robotics to machine learning to social complexity environments. The Repast platform, initially developed as a tool to be used in social sciences, is a family of agent-based modelling and simulation platforms. Currently, not only does it provide a package to be used in regular environments, such as desktops, and laptops [25], but also has an advanced version to be used in HPC environments [26] to simulate more demanding scenarios.

The ABM and MAS concepts are closely related to each other since both are agent-based. The difference is that the modelling in ABM aims to gain insight about emergent properties in complex adaptive systems while MAS focus on actual agents [27,28].

Our framework aims to provide an environment to combine ABM properties, such as the ability to model agents in a simulated environment and observe their behaviour, with MAS's ability to implement and deploy the agents evaluated during the simulation in the action environment. The integration of both approaches can yield positive results. The framework in [29], features this combination. It enables mobile agents to work simultaneously both in the actual and virtualized agent platform to enable large-scale simulation to observe unknown emergence behaviour. This is accomplished by having the agents in the actual environment collecting data in real-time, and improving the simulation, which in turn can have its outcome used in decision making. In our case, we need a separate simulator component to enable the study of the agents' behaviour without disrupting the environment before their deployment. This then enables a developer to leverage the construction of reliable physical agents by evaluating their interactions and the effects of their actions. Because simulation can involve the compression of time, it is possible to simulate many different scenarios efficiently. Also, during the simulation, it is possible to create unexpected scenarios to see how the agents perform. This can be desirable in areas that already employ ABM for resource management during a disaster, such as [30], that could be extended to have software agents that could direct resources where it is necessary. We are aiming to combine the characteristics of both simulation and development to offer an alternative for creating solutions in resource management using MAS.

Figure 1 illustrates our workflow to achieve this integration. A user/admin can create the environment, its resources, and the agents' models, and define how they are organized. The MAS environment is generated, and the user can simulate a scenario. After analyzing the simulation results, the user can decide to refine the models or apply the agents' models to the actual environment. The output of this environment can also be used as feedback to improve the existing models and create a more reliable simulation.

As we have discussed, none of the previoius MAS frameworks have a simulator component that allows this workflow without having to reimplement the concepts defined during the simulation into the actual environment.

**Figure 1.** Workflow for MAS conception, evaluation, and deployment.

#### **3. Our Approach**

Below we describe the background of the framework, our design choices, and present the architecture.

#### *3.1. Background*

Data centres present complex dynamic environments where many resources are allocated to provide guaranteed, reliable services. These resources are not only related directly to the computer systems, such as processing power, storage capacity, and network, but also to the supporting equipment and environmental control. Data centre administrators face a daunting task in trying to manage them optimally. There are impacts when these resources are misconfigured or overallocated on the total cost of ownership because of excess power consumption or idle resources. Data centre operators can also incur financial penalties due to the broken guarantees related to service delivery.

One proposed way to tackle this problem is the adoption of autonomic computing architectures and strategies. An autonomic approach aims to embody the idea of self-management, which in turn can be realized by decomposing it into other sub-properties, such as self-configuration, self-healing, self-optimization, and self-protection [31]. This separation of concerns can be managed by decentralized autonomous agents that may interact with each other, optimizing local resources to achieve global optimization, as exemplified in [32], where a MAS manages the number of hosts available to process workloads depending on the demand.

One problem faced in the employment of MASs in data centres is the impracticality of having an actual data centre available to evaluate the effectiveness of the policies governing agents due to costs and security concerns. In some cases, data centre simulators, such as that described in [33], have been developed to evaluate the possible impacts of the autonomic agents policies in the data centre before implementing and deploying them in the real environment. Even so, there is a gap, however, in the development between the simulation step and the implementation and deployment of the actual agents. To address this problem, a framework was developed using initially the concepts of the aforementioned simulator with the additional feature of employing the assessed policies during the simulation to drive the software agents that will manage the actual resources in the environment.

Contrary to most of the works related to MAS platforms presented in the previous sections, our approach was first developed to solve a domain-specific problem focused on self-management of resources. During the development, the generic components were extracted to allow more flexibility during the implementation and the test of the proposed solutions. This resulted in the general purpose ARPS framework to enable MAS.

The generalization of the scenarios where our MAS framework can apply, although it is not limited by, is illustrated in Figure 2. As can be seen, there is a complex system with resources (*Rn*), grouped by environments (*Environmentm*), that are affected by external events (*E*). The events occur at a variable or fixed interval in this system. Without any management, these resources can be in a suboptimal state. To overcome this problem, agents (*Ai*), driven by policies, are employed to monitor resources or modify them using available touchpoints. The agents can be reactive, proactive. They can act in isolation, or they can communicate with each other for cooperation, coordination, or negotiation.

**Figure 2.** Scenario for resources being modified by external events while agents manage them.

The framework is being currently developed in Python 3, a high-level multiplatform language that can be deployed in myriad host platforms, including Internet of Things (IoT) devices [34,35]. The users can install the framework from source code or as a Python package, available at https://pypi.org/ project/arps/. This means that, when implementing the agents for a specific domain, the user needs to implement all the components using Python. Albeit there is no single programming language that can be applied in every domain effectively, Python has been suggested as an alternative for a general programming language to be adopted by the scientific community and it has been used by researchers in areas not related to technology or engineering, such as psychology and astronomy [36,37].

#### *3.2. Architecture*

The ARPS framework is composed of four main components: agent manager, agents, discovery service/yellow pages service and Discrete Event Simulator (DES). The architecture and relationship of the first three are illustrated in Figure 3. The agent manager has three main aspects: the management of the availability of resources and policies related to an environment, agent life-cycle, and simulation

when it is running for this purpose. It acts as a container for the agents. This container groups resources logically by some criteria, like resource similarity, or accessibility. Agent managers can be distributed across systems. The agent is an entity driven by policies that will manage one or more resources. It can interact with all other agents available by the discovery service. Since each agent manager can be deployed in distributed manner, the discovery service is a directory where its agents are registered when created and their location is made available to be discovered by other requesters, so the agent from one agent manager can exchange messages with agents from others agent managers. Lastly, the DES is a component available for the evaluation of the agents, and its integration with the others will be explained in the following sections.

**Figure 3.** Architecture of the ARPS framework.

#### 3.2.1. Interoperability

The agent manager, agents, and discovery service implement the RESTful architecture style for interoperability. This is done using HTTP methods, and the payload is in JSON format. This RESTful API employs uniform resource access using the format http://hostname:port/resource/, where the *hostname:port* are the host name and listening port to access the API respectively, and the *resource* is a web resource. The HTTP methods (GET, POST, PUT, and DELETE) are available for the clients. The payload of the POST and PUT methods, as well as the possible HTTP response codes, were omitted for simplification purposes. Their descriptions are available at https://gitlab.com/arps/arps/wikis/ home. Below we present the API to perform the requisitions.

The agent manager API is intended to fulfill requests by the user. The summary in Table 1 shows how the policies that can be used to drive agents' behaviour can be listed, the agents that are currently running in its environment, the touchpoints that agents can monitor/control, and, when there are agents created just for the purpose of monitoring resources, a timestamped log containing the state of the resources can be retrieved.

The life-cycle of the agents is controlled by creation, modification, inspection, and termination methods, as seen in Table 2.


**Table 1.** REST API for environment.

**Table 2.** Agent Manager API for agent life-cycle.


Lastly, the REST API is available only when the agent manager is created to run the simulator is seen in Table 3.

**Table 3.** Agent Manager API in simulator mode.


The agents have their API described in Table 4. It is not only accessed with the intent of message exchange among the agents, but it can also be called by users or other applications.



The API provided by the discovery service, shown in Table 5, is used to register, unregister, and list the running agents in the system.


**Table 5.** Discovery Service.

We are aware of the existence of FIPA-HTTP to make compliant FIPA agents using HTTP [38]. This standard, however, only supports the POST method with the semantics of the message embedded in a multi-part content sent in the body of the message. This defeats the purpose of the semantics presented by the HTTP methods, and the already available HTTP status codes related to each HTTP method. Some studies that support this view of interoperability in MAS, using Resource Oriented Architecture (ROA), can be found in [39–41],

There are many advantages of using RESTful architecture style for interoperability. The system is open, so applications that the system designers did not take into account a priori can be integrated into the existing system since the web resources are accessed uniformly. Secure communication can be implemented over HTTPS. A cache can be used to save bandwidth and to optimize system communication. Visualization through different devices can be easily implemented by third-party entities across different devices. The support of the MAS for the REST API can also be used to extend the API and make it compliant with FIPA-HTTP if required.

#### 3.2.2. Agent Model

The intelligent agent model, as described by Russel & Norvig [42], is used to define the agent. Thus, besides the conventional definition of the agent, as an entity perceiving the environment through sensors and modifying it through actuators, it has also the components to achieve reasoning. To this end, the agent's behaviour is driven by a set of policies that are executed by its control loop. A policy can be activated either by interaction with other agents, or internally by a fixed interval. Each policy has access to the touchpoints available in the environment provided by the agent during its creation. Another characteristic is that policies can be dynamically added or removed.

The policies can be reactive or proactive. The reactive policy follows the Event Condition Action (ECA) rule. This model is used to create simple reflex agents. Thus, the agent monitors the environment, and, when the current state matches a condition, a predefined action is performed. Alternatively, the agents can optimize the environment continuously. In this case, their behaviour can be defined in terms of proactive policies. To this end, the user can extend the interface to implement goal-based or utility-based policies.

Another characteristic is related to how agents are organized. The framework defines that an agent has a unidirectional relationship with other agents. This relationship is also dynamic and can be created or removed during the agent lifetime. Since this model does not impose any form of organization and agents communicate using a peer-to-peer architecture, they can be organized hierarchically, or horizontally. Agents can have a relationship established with any other agent available through the yellow pages service. Currently, there is no support to enforce groups or roles.

#### 3.2.3. Simulation

During the simulation, only a single instance of an agent manager is necessary since it will act as the gateway to all other virtual agent managers that would be available in the system, as seen in Figure 4a. Similarly to the actual environment, the format used by the API is http://hostname: port/agent\_manager/resource/, where *agent*\_*manager* antecedes *resource* to identify the virtual agent manager that will perform the requisition.

In the actual environment, the transport system used to exchange messages between agents is implemented using the HTTP protocol, as seen previously. It relies on the physical network to function. During the simulation, however, this protocol is substituted by a global bus that will serve as a discovery service and communication layer at the same time. The content sent from one agent is put directly into the queue of messages of the receiver agent. The difference between real and simulated communication is seen in Figure 4b. This has the advantage of removing the overhead of communication. Additionally, it gives the freedom to the user to apply models into this layer to improve the reliability of the agents to evaluate them under intermittent communication, increased latency, or corrupted messages.

The resource is another component that has to be overridden by the models that describe how they behave in the real environment. For example, in our case, we can model the behaviour of the computational resources to have a certain load accordingly to the task that arrived in the system. As well, our model supports resources that affect other resources indirectly. Revisiting the previous example, we can create a model that increases the energy consumption based on the load in the system when the computational resources are utilized.

The main component of the DES is the events generator. It supports both deterministic and stochastic models. The former uses a log file containing the events—when the last event is completed the simulation is terminated—while the latter uses a stochastic generator implemented by the user—the termination needs to be invoked explicitly. Each event has two actions. The main action that is executed every step while the event is still unfinished, and a post-action, when the event exits the system. Figure 4c showing the resource being modified by an event during the simulation and in Figure 4d showing the queue of deterministic events illustrates the DES component working along with the agents

**Figure 4.** Agent Manager in Simulator Mode.

When finished, the results of the simulation are made available in the CSV format. We have chosen this format since it is supported by a myriad of statistical tools, such as Pandas (https:// pandas.pydata.org/), R (https://www.r-project.org/), or spreadsheet-like apps such as LibreOffice (https://www.libreoffice.org/). We believe these third-party tools are better equipped to fulfill the needs of the user.

#### **4. Demonstration of the Framework**

In this section, we will illustrate how the ARPS framework can be used to solve the problem of resource management. In the next section, we describe the components needed to be implement to enable the MAS. Following this, we present an example in a specific domain: management of resources in data centres.

#### *4.1. ARPS Framework Usage*

The modelling of the MAS for resource management can be accomplished by following the steps:


The usage of the framework can be summarized as the implementation of some interfaces, and the creation of the configuration files needed by the agent managers and agents.

We have created the abstraction of the resources that provides the touchpoints to be accessed by sensors and actuators, seen in Figure 5.

Starting with the sensors and a minimal set of the configuration files (files that will be described in detail later in this section), it is possible to execute the agents in the actual environment to collect data since a set of monitoring policies related to the implemented resources are made available automatically. Therefore, it is possible to initially create agents with the only the purpose of gathering data periodically. This data provides insight about the actual resources . This will be essential during the modelling of the environment used by the simulator. This model can be later updated when comparing the simulated environment with the actual environment.

**Figure 5.** Resources, Sensors, and Actuators structure diagrams.

Once the resources and sensors are implemented, the next step is the implementation of the actuators. The actuators are used by the policies that drive the behaviour of the agents. These policies can be created by implementing the Policy interface, illustrated by a reflex policy in Figure 6. Since our approach is based on autonomic computing concepts [43], other types of policies, employing utility functions, or goal-based approaches can be implemented. The method *condition(event)* can be implemented to use optimization algorithms, defined by the user, that will compute the ideal parameters and then use the results to modify the resources through their actuators. Goal-based approaches would require the user to model the states and use the method *condition(event)* to search for the best action to perform to achieve a better state. Both policies would be executed as periodic

policies, so the event is just a time event indicating that it is time for the policy to evaluate the current state and executes its actions based on the perceived environment.

**Figure 6.** Policy structure diagram.

Since the policies encapsulate the perception and modification of the resources, and these resources can be modified by external events, the next step is the modelling of the these events. The events arrival in the system can be stochastic or deterministic and it can be created by implementing the interface EventQueueLoader. The events are encapsulated in the SimEvent class, which in turn contains the SimTask that will be executed during the simulation. The SimTask *main* method should provide the behaviour that will modify a resource, while the *pos* method will contain all the finalization process to be executed, like resource release, after the task is finished. The relationship between the DES components is illustrated in Figure 7.

**Figure 7.** Simulator structure diagram.

Given the structure present in Figure 8, the final step before running the MAS in the actual or simulated environment is the creation of the configuration files, in JSON format, containing the implemented files previously described in this section. Both configuration files are similar in their structure. The file *simulation.conf* contains the paths of the files related to the fake resources, and the DES component classes, while the file *real.conf* contains only the actual resources. The remaining classes, like policies, sensors, and actuators, remain the same for both configuration files. Therefore, the transition of the agents from the simulated environment to the real environment can be done seamlessly.

After the instantiation of the agent manager in simulation mode, using the *simulation.conf* configuration file, the REST API presented in the previous section can be used to create the agents, organize them, and run, stop, and collect the result of the simulation.

Based on the result, only the policies classes need to be modified. Then, using the *real.conf* configuration file, the agents can be deployed in the actual environment. According to the behaviour observed in the actual environment, improvements can be made in models used by the simulated environment. Thus, it is only necessary to modify existing fake resources, simulation tasks and policies. Incrementally, new resources, touchpoints, and policies can be added into the MAS to improve the system.

**Figure 8.** The minimal structure required.

#### *4.2. Example: Data Centre Management*

As previously discussed in Section 3, we are researching ways to address the problems in trying to optimize energy efficiency in the data centre while ensuring that other performance metrics are met using MAS. To illustrate this, we provide a minimal example to cover the basic setup of the ARPS framework. The source code is available at https://gitlab.com/arps/arps/tree/main/arps/examples/ computational\_resources\_management.

This example by no means provides an in-depth analysis of resource management in data centres and only serves to illustrate how the MAS can be realized. To keep this example simple, we will manage only two resources from a single host: the CPU and the Energy Monitor. Additional resources, such as temperature sensors of the environment, cooling system, and multiple hosts, can be added iteratively as the need for better understanding the system arises.

The CPU provides means to both read its current utilization or modify its frequency. It is driven by two governors: performance and powersave. This enables dynamic frequency scaling. The maximum frequency is 2.9 GHz in each one of its four cores and when in performance mode, the frequency is adjusted dynamically. Conversely, in powersave mode, the frequency stays closer to the minimum frequency available. The Energy Monitor only provides an interface to read the estimated power consumption in watts using the tool *PowerTOP* [44]. This tool was developed by Intel, and provides estimates on power consumption.

To understand how energy consumption is affected when the CPU has a certain workload, we collected the data using the sensors. As described in the previous section, when a resource is implemented, a monitor policy related to it is made available by the Agent Manager. In this case, it is possible to create agents executing the CPUMonitorPolicy and EnergyMonitorPolicy policies periodically. The collected data were summarized into two charts, where the Y-axis represents the normalized value of the CPU workload and the estimated power. The X-axis is the time in seconds. In Figure 9, the CPU is using a performance governor, while in Figure 10 the CPU is using powersave governor. As expected, there is a correlation between workload and energy consumption in performance mode since the CPU adjusts its frequency during the execution. The same does not apply in the powersave mode, where the energy consumption almost never goes over a certain value.

**Figure 9.** Correlation of energy consumption and CPU workload using performance governor.

**Figure 10.** Correlation of energy consumption and CPU workload using powersave governor.

To reduce energy consumption, for example, we can use reactive policies implementing the method *condition(event)* of the Policy interface to evaluate the workload in the CPU. The action performed by the policy can be the adjustment of the CPU's frequency to a lower level. In this example, we considered two different strategies for saving energy. The strategies involve three policies; these policies are presented in Figure 11. The first adjusts the CPU to powersave mode when the utilization is over 80% while in the second sets the CPU to powersave mode when the utilization is over 50%. The third returns the CPU to performance mode when the utilization is under 50. The first strategy is composed of policies PowersaveDynamicScalingPolicy1 and PerformanceDynamicScalingPolicy; strategy two uses PowersaveDynamicScalingPolicy2.

We have chosen batch jobs to represent the external tasks that modify the environment to evaluate how the agents will manage energy consumption. In this case, we opted to use the deterministic model of the job arrival using an external file of events, where each row represents one event arriving in the system containing the arrival time, duration of the task, and workload. The SimTask *main* method, seen in Figure 12, allocates the CPU resource, and consequently causes additional power consumption proportionately to the workload when the CPU is performance mode and more steady when the CPU is in powersave mode, as mentioned previously. The SimTask *pos* method would release the resource as soon as the task is completed.

```
class PowersaveDynamicScalingPolicy1 (Policy ):
    def condition ( self , event ):
         return s el f . cpu . read () > 80
    def action ( self ):
         self . cpu freq . set ( governor=' powersave ' , max frequency=0.9)
class PowersaveDynamicScalingPolicy2 (Policy ):
    def condition ( self , event ):
         return s el f . cpu . read () > 50
    def action ( self ):
         self . cpu freq . set ( governor=' powersave ' , max frequency=0.9)
class PerformanceDynamicScalingPolicy (Policy ):
    def condition ( self , event ):
         return s el f . cpu . read () <= 50
    def action ():
         self . cpu freq . set ( governor=' per formance ' , max frequency=2.9)
```
**Figure 11.** Example of implementation of a reactive policy.

```
class CPUTask( SimTask ) :
     def main ( s e l f ):
           i f not self . acquired :
                s e l f . acquired = True
                s e l f . cpu . v al u e += s e l f . wordload
           i f s e l f . cpu . f r e q u e n c y == ' powersave ' :
                estimate = self . powersave estimate ( s el f . cpu . value )
           elif s e l f . cpu . f r e q u e n c y == ' pe r f o rm ance ' :
                estimate = self . performance estimate ( s el f . cpu . value )
           sel f . energy monitor . value = estimate
     def pos ( sel f ):
           s el f . cpu . value −= sel f . wordload
```
**Figure 12.** Example implementation of a task to update the energy monitor based on CPU workload.

The first analysis suggests that the average energy consumption with the first strategy is 9 Watts while for the second one it is 7.57 Watts. Without taking any other resources correlated to energy consumption and CPU, we can infer that the second one is better. However, other resources or metrics, such as a metric on Quality of Service (QoS), can be affected by the CPU dynamic frequency scaling of strategy two; these could be included in the model for further investigation.

When comparing the policies in the real environment, a similar behaviour is observed. The average energy consumption in the first policy is 13.75 Watts while in the second one is 12.34 Watts. These results come from a very limited set of experiments and comparisons and so no significance can be determined. Again, this was meant as example to illustrate how the ARPS framework can be used.

#### **5. Discussion**

We presented ARPS, a framework to enable evaluation of a MAS before deployment in an actual environment. This framework is the result of a general purpose MAS solution developed to be as flexible as possible to provide management of a data centre. It has proven useful in a specific domain, and we believe that others in the MAS community can benefit from this additional option when developing solutions to their problems. To this end, we illustrated how it could be applied using our specific case.

Due to the decision of using the RESTful architecture style, agents can be deployed manually by a user in an environment that would contain a single agent, without the need of an agent manager. This unintended feature is possible because agents are self-contained and all the operations are made available through the API. The interaction can be done using web clients widely available on the Internet. The only requirement is that the platform can run Python applications. Thus, a user can instantiate an agent to control an autonomous robot developed in a platform such as a RaspberryPI, or it can deploy an agent in devices of an IoT environment.

We have identified other features that could enrich this framework. The inclusion of a Policy Management Tool can ease the process of creating new policies dynamically using a high-level language. Also, new resources and their touchpoints could be made available on the fly or dynamically removing unavailable resources. This could improve the integration of simulation and deployment even more. We plan to add a learning component to make agents more robust to complex adaptive environments. Further, reliability characteristics, such as fault tolerance, and security concerns, are being addressed and will be included in future versions.

Currently, the framework has some limitations related to features that were set aside due to low priorities. The framework does not come with a visualization component. Thus, it is not possible to visualize the running agents both during the simulation or within the real environment. Also, because we were aiming to have actual agents, we did not evaluate the usage of this framework applied to problems entailing only agent-based modelling simulation (ABMS). Since ABMS and MAS concepts overlaps, this MAS framework could be adapted to be used as ABMS tool. We do not know, however, how it performs when millions of agents are deployed in the environment. The framework does include statistical tools to analyze the results of the simulation or to assess the system in the actual environment since we believe that each user has their own needs and preference for tools. As mentioned, all the data gathered via the framework is made available in CSV format and so other applications that can provide more suitable tools can be used to extract insightful information from the data. Lastly, the framework is not FIPA compliant yet.

**Author Contributions:** Conceptualization, T.C.P. and M.B.; Methodology, T.C.P. and M.B.; Software, T.C.P.; Supervision, M.B.; Visualization, T.C.P.; Writing—original draft, T.C.P. and M.B.; Writing—review & editing, T.C.P. and M.B.

**Funding:** This research is partially supported by CNPq, Brazil grant number 249237/2013-0.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Consensus Algorithms Based Multi-Robot Formation Control under Noise and Time Delay Conditions**

#### **Heng Wei 1,\*, Qiang Lv 1, Nanxun Duo 1, GuoSheng Wang <sup>1</sup> and Bing Liang <sup>2</sup>**


Received: 2 January 2019; Accepted: 6 March 2019; Published: 11 March 2019

**Abstract:** In recent years, the formation control of multi-mobile robots has been widely investigated by researchers. With increasing numbers of robots in the formation, distributed formation control has become the development trend of multi-mobile robot formation control, and the consensus problem is the most basic problem in the distributed multi-mobile robot control algorithm. Therefore, it is very important to analyze the consensus of multi-mobile robot systems. There are already mature and sophisticated strategies solving the consensus problem in ideal environments. However, in practical applications, uncertain factors like communication noise, communication delay and measurement errors will still lead to many problems in multi-robot formation control. In this paper, the consensus problem of second-order multi-robot systems with multiple time delays and noises is analyzed. The characteristic equation of the system is transformed into a quadratic polynomial of pure imaginary eigenvalues using the frequency domain analysis method, and then the critical stability state of the maximum time delay under noisy conditions is obtained. When all robot delays are less than the maximum time delay, the system can be stabilized and achieve consensus. Compared with the traditional Lyapunov method, this algorithm has lower conservativeness, and it is easier to extend the results to higher-order multi-robot systems. Finally, the results are verified by numerical simulation using MATLAB/Simulink. At the same time, a multi-mobile robot platform is built, and the proposed algorithm is applied to an actual multi-robot system. The experimental results show that the proposed algorithm is finally able to achieve the consensus of the second-order multi-robot system under delay and noise interference.

**Keywords:** multi-robot; consensus problem; formation control; noise; time delay

#### **1. Introduction**

In recent years, with the continuous development of computer science, complex network theory and control theory, autonomous mobile robots have received more and more attention [1]. Compared to single mobile robots, multi-mobile robot systems have better stability, higher fault tolerance and higher work efficiency. As a result, they have better application prospects and higher research value in the fields of reconnaissance, patrol, rescue and environmental survey. Formation control of multi-mobile robots is the basis of multi-mobile robot systems, and has become a hotspot in the field of robotics [2].

As part of the design process of multi-robot formation control algorithm, many problems need to be considered, including robot model, external environmental interference, sensor measurement noise, algorithm control precision, and the controllability of different formations [3]. The existing formation control algorithms for multi-robots mainly include the leader-follower algorithm [4], the behavior-based algorithm [5], the graph theory-based method [6], the virtual structure method [7], and the artificial potential field method. The leader-follower algorithm has flexible motion strategy and scalability, but the algorithm cannot form stable and reliable feedback between the follower and the leader. Therefore, the control error of the follower will increase with interference from the environment. In particular, when the leader fails, it can cause the entire multi-robot system to crash. The behavior-based algorithm can effectively reduce the complexity of the entire formation control algorithm, but it has higher requirements in terms of sensor sensing ability and communication ability between robots, and cannot accurately quantify the behavior of robots during operation. Thus, it is difficult to guarantee the system's robustness using the behavior-based algorithm. The virtual structure method is convenient for designing the formation behavior of multi-robot systems, while due to the constraints of rigid structures, it lacks flexibility with respect to obstacle avoidance and formation transformation. The artificial potential field algorithm has a simple structure and can effectively avoid collisions and obstacles, but it is susceptible to interference when maintaining the formation, and it is difficult to perform precise formation control. Moreover, the potential energy function needs to be reset if the formation transformation is performed, leading to a lack of flexibility.

In view of the shortcomings of the traditional formation control algorithm, considering the increase in the number of robots in the multi-robot system and the continuous improvement of the data processing capability of a single robot, the distributed multi-robot control algorithm has attracted the attention of researchers. The distributed multi-robot system can make full use of the data processing resources of the robot and share the pressure of the central processing machine, which has great advantages in terms of flexibility and fault tolerance [8,9]. In addition, solving the consensus problem is the core of the distributed multi-robot control algorithm [10]. There are already mature and sophisticated strategies for solving the consensus problem in ideal environments [11,12]. However, in practical applications, uncertain factors like communication noise, communication delay and measurement error will still lead to many problems in multi-robot formation control. Some algorithms have considered some practical problems. Reference [13] studied the conditions of the system reaching consensus under uniform delay, when the communication structures of second-order multi-robot systems were a directed graph with spanning tree or a strongly connected graph, respectively. However, that paper does not consider the noise condition or consensus under different delay conditions. Reference [14] studied the consensus problem of second-order multi-robot systems under noisy conditions. A control protocol based on distributed sampling data was proposed to achieve system consensus, but the delay condition was not taken into account in the algorithm. Reference [15] studied the consensus of second-order multi-robot systems under non-uniform and multi-time delays using the frequency domain analysis method. Compared with the Lyapunov method, it has lower conservativeness, and the results were extended to higher-order multi-robot systems. However, it did not take noise into consideration, which is unavoidable in practical environments. Reference [16] studied the consensus of second-order multi-robot systems under uniform time delay and noise environments, and designed different control protocols for different types of noise, thus achieving the consensus of the system. These algorithms provide some basic solutions to the second-order system consensus problem, but the problems encountered by multi-robots in practical applications are far more varied than these. On the basis of these algorithms, this paper performs a more in-depth analysis, especially considering the consensus of the second-order system in which there are many different time delays and multiplicative noises in the system, laying the foundations for a formation control algorithm for second-order multi-robot systems that can be truly implemented in real robot systems.

In summary, this paper analyzes the consensus problem of second-order multi-robot systems under various delay and noise conditions. The system character equations are transformed into quadratic polynomials of pure imaginary eigenvalues based on frequency domain analysis, and then solved. Finally, its critical steady state is obtained and verified using Matlab numerical simulation. Compared with existing algorithms, this algorithm has lower conservativeness, and it is easier to extend the results to higher-order multi-robot systems. Since the omnidirectional mobile robot is a fully driven robot, and the horizontal and vertical directions can be separately controlled, it can be constructed as two one-dimensional second-order multi-robot systems. Therefore, experiments

were carried out on a multi-omnidirectional mobile robot platform built in the laboratory using the proposed algorithm [17,18], which verifies the effectiveness of the proposed algorithm.

#### **2. Pre-Preparation and Problem Description**

#### *2.1. Graph Theory*

*<sup>G</sup>* <sup>=</sup> {*V*, *<sup>E</sup>*} represents the communication topology between robots, in which each robot represents a node. *V* is a set of nodes. *E* is a set of edges, representing the connection state between robots. The topology map is represented by a Laplacian matrix, which is *<sup>L</sup>* <sup>=</sup> *<sup>D</sup>* <sup>−</sup> *<sup>A</sup>*. *<sup>D</sup>* is the degree matrix, which represents how many nodes are adjacent to each node. *A* = [*aij*] is the adjacent matrix and *<sup>i</sup>*, *<sup>j</sup>* <sup>∈</sup> *<sup>V</sup>*. *Ni* represents all sets of nodes adjacent to the *<sup>i</sup>* node. If node *<sup>j</sup>* is adjacent to node *<sup>i</sup>*, then *aij* <sup>&</sup>gt; 0. If *aij* <sup>=</sup> *aji* for any *<sup>i</sup>*, *<sup>j</sup>* <sup>∈</sup> *<sup>V</sup>*, the graph is an undirected graph; otherwise, it is a directed graph. If there is a directed path on any two nodes in the graph, the directed graph *G* is strongly connected. If there is a directed path to a node in the graph to any other node, then the directed graph *G* contains a spanning tree. If the undirected graph *G* is strongly connected, it is called a connected graph. When the undirected graph *G* is a connected graph, its Laplacian *L* matrix contains a zero root, and the other eigenvalues are positive real numbers. When a directed graph *G* contains a spanning tree, its Laplacian *L* matrix contains a zero root, and the rest eigenvalue's real part are positive.

#### *2.2. Problem Description*

Suppose the system consists of n omnidirectional robots. The dynamic characteristics of the omnidirectional robot in the x direction are:

$$\begin{cases}
\dot{x}\_i(t) = v\_i(t) \\
\dot{v}\_i(t) = u\_i(t)
\end{cases} \tag{1}$$

where *xi*(*t*) is position, *vi*(*t*) is velocity and *ui*(*t*) is input control. If any *i* robot and *j* robot in the multi-robot system satisfy the identities as follows:

$$\lim\_{t \to +\infty} \left[ \mathbf{x}\_i(t) - \mathbf{x}\_j(t) \right] = 0 \tag{2}$$

$$\lim\_{t \to +\infty} [v\_i(t) - v\_j(t)] = 0 \tag{3}$$

then the multi-robot system (1) has achieved consensus under the control protocol *ui*(*t*) Let the state vector of the *i* robot be *δi*(*t*) = [*xi*(*t*), *vi*(*t*)] *<sup>T</sup>*, then the multi-robot system state vector is *S*(*t*) = [*δ*1(*t*), *δ*2(*t*), *δ*3(*t*),..., *δn*(*t*)]. Rewrite system (1) as:

$$
\dot{\mathbf{S}}(t) = \mathbf{Y} \mathbf{S}(t) \tag{4}
$$

where *<sup>Ψ</sup>* <sup>=</sup> *<sup>I</sup>* <sup>⊗</sup> *<sup>A</sup>* <sup>−</sup> *<sup>L</sup>* <sup>⊗</sup> *<sup>B</sup>*, *<sup>A</sup>* <sup>=</sup> 0 1 0 0 , *B* = 0 0 *k*<sup>1</sup> *k*<sup>2</sup> , ⊗ is Kronecker. When ideally without noise and delay, the control protocol designed in [13] is as follows:

$$u\_i(t) = \sum\_{j \in N\_i} a\_{ij} \left\{ k\_1[\mathbf{x}\_i(t) - \mathbf{x}\_j(t)] + k\_2[v\_i(t) - v\_j(t)] \right\} \tag{5}$$

where *aij* > 0 is the topology weight of the communication between robot *i* and robot *j*, *k*<sup>1</sup> is the position scale factor that needs to be designed, *k*<sup>2</sup> is the velocity scale factor that needs to be designed. Lemmas 1 and 2 give the conditions that the coefficient matrix *Ψ* of control protocol (5) must satisfy when the communication topology of system (4) is undirected graph and directed graph, respectively. **Lemma 1.** *When the communication topology of multi-robot system (4) is connected graph, the coefficient matrix Ψ has a double zero root, and the real part of other eigenvalues is negative*.

**Proof.** Let there be an orthogonal matrix *Q*, such that:

$$\mathbf{Q}^T \mathbf{L} \mathbf{Q} = \text{diag}\{0, \lambda\_{2\prime}\lambda\_{3\prime}, \dots, \lambda\_{n}\} \tag{6}$$

where 0, *λ*2, *λ*3, ... , *λ<sup>n</sup>* is the eigenvalue of the Laplacian matrix *L*, and *λ<sup>i</sup>* >0(*i* = 2, 3, ... , n). Formula (7) is obtained from Formula (6):

$$(\mathbf{Q}\otimes\mathbf{I}\_2)^T\mathbf{P}(\mathbf{Q}\otimes\mathbf{I}\_2) = \text{diag}\{\mathbf{A}, \mathbf{A} - \lambda\_2\mathbf{B}, \mathbf{A} - \lambda\_3\mathbf{B}, \dots, \mathbf{A} - \lambda\_\text{ll}\mathbf{B}\}\tag{7}$$

The determinant of Formula (7) is obtained:

$$\left| \text{diag} \left\{ A, A - \lambda\_2 \mathbf{B}, A - \lambda\_3 \mathbf{B}, \dots, A - \lambda\_n \mathbf{B} \right\} \right| = s^2 \prod\_{2}^{n} s^2 + \lambda\_i k\_2 s + \lambda\_i k\_1 = 0 \tag{8}$$

Because there is *s*<sup>2</sup> in Formula (8), there must be a double zero root in the eigenvalue. By solving polynomial equation *s*<sup>2</sup> + *λik*2*s* + *λik*<sup>1</sup> = 0, we can get:

$$s\_1 = \frac{-\lambda\_i k\_2 + \sqrt{(\lambda\_i k\_2)^2 - 4\lambda\_i k\_1}}{2}$$

$$s\_2 = \frac{-\lambda\_i k\_2 - \sqrt{(\lambda\_i k\_2)^2 - 4\lambda\_i k\_1}}{2}$$

Based on this analysis, when (*λik*2) <sup>2</sup> > 4*λik*1, obviously <sup>−</sup>*λik*<sup>2</sup> <sup>±</sup> (*λik*2) <sup>2</sup> <sup>−</sup> <sup>4</sup>*λik*<sup>1</sup> < 0, so the eigenvalues *s*<sup>1</sup> and *s*<sup>2</sup> are negative. When (*λik*2) <sup>2</sup> < 4*λik*1, because *<sup>λ</sup>ik*<sup>2</sup> > 0, so <sup>−</sup>*λik*<sup>2</sup> < 0, the eigenvalues *s*<sup>1</sup> and *s*<sup>2</sup> have negative real parts. Lemma 1 is proved. -

**Lemma 2.** *When the communication topology of multi-robot system (4) is directed graph and contains spanning tree, <sup>k</sup>*<sup>1</sup> <sup>∈</sup> (0, *<sup>k</sup>*0*k*<sup>2</sup> <sup>2</sup>) *the coefficient matrix Ψ has a double zero root, and the real part of other eigenvalues is negative. Where k*<sup>0</sup> = min *λi*=0 *λi*<sup>2</sup> real(*λi*) imag(*λi*) .

**Proof.** The characteristic determinant of system (4) is obtained by Formula (8), assuming that there are polynomial equations:

$$s^2 + s(a+bj) + k(a+bj) = 0\tag{9}$$

where *a* > 0, *k*, *a*, *b* ∈ *R*. Let *s* = *jw*:

$$-w^2 - bw + ka + (aw + kb)j = 0\tag{10}$$

Solving Formula (10), we can get:

$$\begin{cases} -w^2 - bw + ka = 0\\ aw + kb = 0 \end{cases} \tag{11}$$

Thus, solved:

$$\begin{cases} \begin{array}{c} k\_1 = 0\\ k\_2 = \frac{a(a^2 + b^2)}{b^2} \end{array} \tag{12}$$

Document [13] proves that when 0 <sup>&</sup>lt; *<sup>k</sup>* <sup>&</sup>lt; *<sup>a</sup>*(*a*2+*b*2) *<sup>b</sup>*<sup>2</sup> , the roots of Formula (9) are all on the left open half plane. Formula (8) is modified according to Formula (9):

$$s^2\prod\_{2}^n s^2 + \lambda\_i k\_2 s + \lambda\_i \frac{k\_1}{k\_2} k\_2 = 0\tag{13}$$

The analysis shows that when *k*<sup>0</sup> = min *λi*=0 *λi*<sup>2</sup> real(*λi*) imag(*λi*) , *<sup>k</sup>*<sup>1</sup> <sup>∈</sup> (0, *<sup>k</sup>*0*k*<sup>2</sup> <sup>2</sup>), the coefficient matrix *Ψ* has a double zero root, and the real part of other eigenvalues is negative. Lemma 2 is proved. -

#### **3. Consensus Analysis of Multi-Robot with Various Delays and Noise Conditions**

In the previous section, we analyzed the conditions under which second-order systems achieve consensus in an ideal environment. However, in real environments, due to noise interference and communication differences between different robotic hardware, the above control protocols need to be improved. Assuming that there are *D* kinds of different delays in the system, the multi-agent system (4) can be changed to:

$$\dot{\mathbf{S}}(t) = (I \otimes A) \cdot \mathbf{S}(t) - \sum\_{d=1}^{D} \left( L\_d \otimes \mathbf{B} \right) \cdot \mathbf{\tilde{s}}(t) \cdot \mathbf{S}(t - \tau\_d) \tag{14}$$

where *ζ*(*t*) is the communication noise or measurement noise between robots, *τij* is the transmission delay, which represents the time taken by *i* robot to receive and process information transmitted by *j* robot, *L<sup>d</sup>* is the Laplacian matrix corresponding to the sub-topological graph of the robot node when the delay is *<sup>τ</sup>d*, and *<sup>D</sup>* ∑ d=1 *L<sup>d</sup>* = *L*.

**Theorem 1.** *If system (14) is a connected graph, the system can achieve consensus when the system delay τ<sup>d</sup> is less than τ*max *under the action of noise ζ*(*t*)*. Among them:*

$$\begin{cases} \text{ } \tau\_{\text{max}} = \left[ \arctan \left( \frac{k\_2}{k\_1} w\_{\text{max}} \right) \right] / w\_{\text{max}}\\ \text{ } w\_{\text{max}} = \sqrt{\frac{\lambda\_{\text{max}}^2 k\_2^2 \tilde{\varepsilon}^2(t) + \zeta(t) \sqrt{\tilde{\varepsilon}^2(t) \left( \lambda\_{\text{max}}^2 k\_2^2 \right)^2 + 4 \lambda\_{\text{max}}^2 k\_1^2}}{2}} \end{cases} \tag{15}$$

**Proof.** Using the frequency domain analysis method for analysis, the Laplace transform of Equation (14) can be obtained:

$$\mathcal{S}(\mathbf{s}) = (s\mathbf{I}\_{2\mathbb{N}} - (\mathbf{I}\_{\mathbb{N}} \otimes \mathbf{A}) + \sum\_{\mathbf{d}=1}^{D} (\mathbf{L}\_d \otimes \mathbf{B}) \zeta(t) e^{-\tau\_{\mathbf{d}}\mathbf{s}})^{-1} \mathcal{S}(\mathbf{0}) \tag{16}$$

$$\text{Let } \mathbf{G}\_{\mathsf{T}}(s) = \mathsf{s}\mathbf{I}\_{2n} - (\mathsf{I}\_{n} \otimes \mathsf{A}) + \sum\_{\substack{\mathbf{d}=1\\\mathbf{d}\_{1}=1}}^{D} (\mathsf{L}\_{\mathsf{d}} \otimes \mathbf{B})\_{\mathsf{T}}^{\tau}(\mathsf{t}) e^{-\mathsf{T}\_{\mathsf{d}}\mathbf{s}}, \text{ so the eigenvalues of the determinant } |\mathsf{G}\_{\mathsf{T}}(s)| \text{ is}\\
\sum\_{\substack{\mathbf{d}=1\\\mathbf{d}\_{1}=1}}^{D} (\mathsf{L}\_{\mathsf{d}} \otimes \mathbf{B})\_{\mathsf{T}}^{\tau}(\mathsf{t}) e^{-\mathsf{T}\_{\mathsf{d}}\mathbf{s}}.$$

are the eigenvalues of the system. Lemma 1 proves that multi-robot system (4) achieves the conditions of consensus. According to Lemma 1, how can the eigenvalues of system (14) be kept in the negative half-plane under the interference of time delay *τ* and noise *ζ*(*t*) relative to the system (4)? Because the measurement noise and communication noise are uncertainties in real environments, it is impossible to carry out accurate quantitative analysis. Therefore, only when the system delay *τ* increases to a value under the action of noise *ζ*(*t*) does a non-zero eigenvalue of the system appear on the virtual axis for the first time, while the time delay *τ* is the critical value for the system to maintain stability.

Assuming that the eigenvalue of the system is on the imaginary axis, let *s* = *jw* be the eigenvalue; then *α* = *α*<sup>1</sup> ⊗ [1, 0] *<sup>T</sup>* <sup>+</sup> *<sup>α</sup>*<sup>2</sup> <sup>⊗</sup> [0, 1] *<sup>T</sup>* is the eigenvector corresponding to the eigenvalue, and if *α* <sup>=</sup> 1, *<sup>α</sup>*1, *<sup>α</sup>*<sup>2</sup> <sup>∈</sup> <sup>C</sup>*n*, then:

$$\left[jwI\_{2n} - (I\_n \otimes A) + \sum\_{d=1}^{D} (L\_d \otimes \mathcal{B})\zeta(t)e^{-jw\tau\_d}\right]a = 0\tag{17}$$

The imaginary eigenvalues of the system appear in pairs in conjugate form. This paper only analyzes the case where *w* > 0. Formula (17) left multiplied by *α<sup>H</sup>* is:

$$a^H \left[ jw\mathbf{I}\_{2n} - (\mathbf{I}\_n \otimes \mathbf{A}) + \sum\_{\mathbf{d}=1}^D (\mathbf{L}\_d \otimes \mathbf{B}) \zeta(t) e^{-jw\tau\_d} \right] a = 0 \tag{18}$$

Because each line of the left matrix of Formula (17) is zero, so *jwα*<sup>1</sup> = *α*2, and substituting it into Formula (18):

$$\sum\_{d=1}^{D} \beta\_d \zeta(t) e^{-jw\tau\_d} = \frac{w^2}{k\_1 + jwk\_2} \tag{19}$$

where *<sup>β</sup><sup>d</sup>* <sup>=</sup> *<sup>α</sup>H*(*Ld*⊗*I*2)*<sup>α</sup> <sup>α</sup>H<sup>α</sup>* . Replace A with B in Formula (19):

$$F(w) = \sum\_{d=1}^{D} \beta\_d \zeta(t) e^{jw\tau\_d} = \frac{w^2}{k\_1 - jwk\_2} \tag{20}$$

Take module operation on both sides of the upper equal sign:

$$\|F(w)\| = \|\sum\_{d=1}^{D} \beta\_d \zeta(t) e^{-jw\tau\_d}\| < \|\sum\_{d=1}^{D} \beta\_d \zeta(t)\| = \frac{a^H (L \otimes I\_2) a}{a^H a} \zeta(t) \le \lambda\_{\text{max}} \zeta(t) \tag{21}$$

Let *w*max = *λ*2 max*k*<sup>2</sup> <sup>2</sup>*ζ*2(*t*)+*ζ*(*t*) *ζ*2(*t*)(*λ*<sup>2</sup> max*k*<sup>2</sup> 2) 2 +4*λ*<sup>2</sup> max*k*<sup>2</sup> 1 <sup>2</sup> get *w* ≤ *w*max, upper formula establishment. From Formula (20):

$$\theta(w) = \arg\text{z}[F(w)] = \arctan(\frac{k\_2}{k\_1}w) \tag{22}$$

where *<sup>θ</sup>*(*w*) <sup>∈</sup> [0, 2*π*). Let *<sup>τ</sup>*(*w*) = *<sup>θ</sup>*(*w*) *<sup>w</sup>* , *<sup>a</sup>* <sup>=</sup> *<sup>k</sup>*<sup>2</sup> *k*1 , deriving for *τ*(*w*), we can obtain:

$$M\_1(w) = \frac{d\tau(w)}{dw} = \frac{1}{w^2} M\_2(w) = \frac{1}{w^2} \left[ \frac{aw}{a^2 w^2 + 1} - \arctan(aw) \right] \tag{23}$$

Deriving for *M*2(*w*) we can obtain:

$$\frac{dM\_2(w)}{dw} = -\frac{2a^3 w^2}{\left(a^2 w^2 + 1\right)^2} < 0\tag{24}$$

*M*2(*w*) is decreasing, so when *w* > 0, *M*2(*w*) < *M*2(0) = 0, so *M*1(*w*) < 0; that is, *τ*(*w*) is also decreasing. So *τ*(*w*) ≥ *τ*(*w*max) = *τ*max. When *τ<sup>d</sup>* < *τ*max, when *τ<sup>d</sup>* < *τ*max, we can get:

$$\tau(w) = \frac{\theta(w)}{w} = \frac{\arg z \left(\sum\_{d=1}^{D} \beta\_d \zeta(t) e^{jw\tau\_d} \right)}{w} \le \frac{\max[w\tau\_d]}{w} < \frac{w\tau\_{\max}}{w} < \tau\_{\max} \tag{25}$$

That is to say, it contradicts *τ*(*w*) ≥ *τ*(*w*max). Therefore, when *τ<sup>d</sup>* < *τ*max, the eigenvalues of the system can be maintained in the left half plane, and the consensus of system (14) can be achieved. Theorem 1 is proved. -

**Theorem 2.** *If system (14) is a directed graph and there is a spanning tree, the system can achieve consensus when the system delay <sup>τ</sup><sup>d</sup> is smaller than the <sup>τ</sup>*max *under the action of noise <sup>ζ</sup>*(*t*)*, and k*<sup>1</sup> <sup>∈</sup> (0, *<sup>k</sup>*0*k*<sup>2</sup> <sup>2</sup>)*. Among them:*

$$\begin{cases} \mathsf{r}\_{\text{max}} = \min\_{\|\lambda\_i\| \neq 0} \left[ \left[ \arctan \left( \frac{k\_2}{k\_1} w\_i \right) - \arg z(\lambda\_i) \right] / w\_i \right] \\\ w\_i = \sqrt{\frac{\lambda\_i^2 k\_2^2 f^2(t) + \zeta(t) \sqrt{\xi^2(t) \left( \lambda\_i^2 k\_2^2 \right)^2 + 4 \lambda\_i^2 k\_1^2}}} \end{cases} \tag{26}$$

*where* arg*z*(*λi*) <sup>∈</sup> (−*<sup>π</sup>* <sup>2</sup> , *<sup>π</sup>* 2 ).

**Proof.** Lemma 2 proved that, when the communication topology of multi-robot system (4) is directed graph and contains spanning tree, *<sup>k</sup>*<sup>1</sup> <sup>∈</sup> (0, *<sup>k</sup>*0*k*<sup>2</sup> <sup>2</sup>), the coefficient matrix *Ψ* has a double zero root, and the real part of the other eigenvalues is negative. The same analysis is performed using the frequency domain analysis method. Similar to the proof of Theorem 1, only when the system delay *τ* increases to the value under the action of noise *ζ*(*t*) does a non-zero eigenvalue of the system first appear on the imaginary axis, while the delay *τ* is the critical value for the system to maintain stability. Take modulo operation on Formula (20):

$$\|\|F(w)\|\| = \|\frac{w^2}{k\_1 - jwk\_2}\|\tag{27}$$

Let *w* be a function of *F*(*w*); then the above formula can be written as follows:

$$w = \sqrt{\frac{\|F(w)\|^2 k\_2^2 + \sqrt{\left(\|F(w)\|^2 k\_2^2\right)^2 + 4\|F(w)\|^2 k\_1^2}}{2}} \tag{28}$$

Then we can get that *w* is an incremental function about *F*(*w*). From Formula (20):

$$\begin{cases} \operatorname{argz}[F(w)] = \arctan(\frac{k\_2}{k\_1}w) \\\\ \operatorname{argz}[F(w)] \le \operatorname{argz}(\sum\_{d=1}^D \beta\_d) + \max(w\tau\_m) \end{cases} \tag{29}$$

So:

$$
\arctan(\frac{k\_2}{k\_1}w) - \arg z(\sum\_{d=1}^D \beta\_d) \le \max(wr\_{\mathfrak{m}}) \tag{30}
$$

Because *<sup>β</sup><sup>d</sup>* <sup>=</sup> *<sup>α</sup>H*(*Ld*⊗*I*2)*<sup>α</sup> <sup>α</sup>H<sup>α</sup>* , so *<sup>D</sup>* ∑ d=1 *β<sup>d</sup>* = *λi*,where *λ<sup>i</sup>* is the non-zero eigenvalue of Laplace matrix *L*. So *F*(*w*)≤*ζ*(*t*)*λi*. Because *w* is an incremental function about *F*(*w*), so:

$$w(\|F(w)\|) \le w(\|\zeta(t)\lambda\_i\|) = w\_i = \sqrt{\frac{\lambda\_i^2 k\_2^2 \zeta^2(t) + \zeta(t)\sqrt{\zeta^2(t)(\lambda\_i^2 k\_2^2)^2 + 4\lambda\_i^2 k\_1^2}}{2}} \tag{31}$$

When *τ<sup>d</sup>* < *τ*max, we can get:

$$\begin{split} \max(w\tau\_d) < w\_l \tau\_{\max} &= \min\left[ \left[ \arctan(\frac{k\_2}{k\_1} w\_l) - \arg z (\sum\_{d=1}^D \beta\_d) \right] / w\_l \right] w\_l \\ &\le \arctan(\frac{k\_2}{k\_1} w) - \arg z (\sum\_{d=1}^D \beta\_d) \end{split} \tag{32}$$

We can find that this contradicts Formula (30), so when *τ<sup>d</sup>* < *τ*max, the eigenvalue of the system can be maintained in the left half plane, and the consensus of system (14) can be achieved. Theorem 2 is proved. -

#### **4. Simulation Verification**

In this section, two sets of Matlab/Simulink numerical simulation experiments are carried out to verify the consensus of the system described in Theorems 1 and 2 when the communication topology is undirected graph and directed graph under the conditions of noise and various delays.

**Experiment 1.** *Let system (14) consist of four robots whose communication topology is shown in Figure 1*.

**Figure 1.** Experiment 1 system communication topology.

As can be seen from Figure 1, the time delay between robots 1 and 2 is *τ*1, between robot 2 and robot 3 it is *τ*2, between robot 3 and robot 4 it is *τ*1, between robot 4 and robot 1 it is *τ*2. If the adjacent communication weight *aij* is 1, then the Laplace matrix *L* is:

$$L = \begin{pmatrix} 2 & -1 & -1 \\ -1 & 2 & -1 \\ & -1 & 2 & -1 \\ -1 & & -1 & 2 \end{pmatrix} \tag{33}$$

We can get *λ*max = 4. Assume that the communication noise or measurement noise is white noise with a maximum amplitude of two. According to Theorem 1, *τ*max = 0.226.

In the first group of Experiment 1, set *τ*<sup>1</sup> = 0.21, *τ*<sup>2</sup> = 0.22, and the initial posture is assumed to be (1,0), (2,0), (3,0), (4,0). The simulation results are shown in Figure 2.

To verify Theorem 1 and compare with the first group of experiments, in the second group of experiments, set *τ*<sup>1</sup> = 0.23, *τ*<sup>2</sup> = 0.24 under the same conditions. The simulation results are shown in Figure 3.

According to Experiment 1, system (14) satisfying lemma 1 can achieve consensus when all *τ<sup>d</sup>* are less than *τ*max, and the system will diverge when all *τ<sup>d</sup>* are greater than *τ*max, which cannot achieve consensus; thus Theorem 1 is verified.

**Figure 2.** Experiment 1 Group 1 simulation results; *x* is position, *v* is velocity. (**a**) Trajectory of *x* changing with time; (**b**) Trajectory of *v* changing with time.

**Figure 3.** *Cont.*

**Figure 3.** Experiment 1 Group 2 simulation results; *x* is position, *v* is velocity. (**a**) Trajectory of *x* changing with time; (**b**) Trajectory of *v* changing with time.

**Experiment 2.** *Let system (14) consist of four robots whose communication topology is shown in Figure 4.*

**Figure 4.** Experiment 2 system communication topology.

The time delay from Robot 1 to Robot 2 is *τ*1, from Robot 2 to Robot 3 it is *τ*2, from Robot 3 to Robot 4 it is *τ*3, from Robot 4 to Robot 1 it is *τ*4. If the adjacent communication weight *aij* is 1, then the Laplace matrix *L* is:

$$L = \begin{pmatrix} 2 & -1 \\ -1 & 1 & -1 \\ & -1 & 2 \\ -1 & & -1 & 1 \end{pmatrix} \tag{34}$$

Then *k*<sup>0</sup> = min *λi*=0 *λi*<sup>2</sup> real(*λi*) imag(*λi*) = <sup>5</sup>×<sup>2</sup> <sup>1</sup> <sup>=</sup> 10, so *<sup>k</sup>*<sup>1</sup> <sup>∈</sup> (0, 10*k*<sup>2</sup> <sup>2</sup>). Assume that the communication noise or measurement noise is white noise with a maximum amplitude of two. Set *k*<sup>1</sup> = 1, *k*<sup>2</sup> = 1, according to Theorem 2, *τ*max = 0.137.

In the first group of experiment 2, set *τ*<sup>1</sup> = 0.13, *τ*<sup>2</sup> = 0.12, *τ*<sup>3</sup> = 0.11, *τ*<sup>4</sup> = 0.1, and the initial posture is assumed to be (1,0), (2,0), (3,0), (4,0). The simulation results are shown in Figure 5.

To verify Theorem 2 and compare with the first group of experiments, in the second group of experiments, set *τ*<sup>1</sup> = 0.14, *τ*<sup>2</sup> = 0.141, *τ*<sup>3</sup> = 0.142, *τ*<sup>4</sup> = 0.143 under the same conditions. The simulation results are shown in Figure 6.

**Figure 5.** Experiment 2 Group 1 simulation results; *x* is position, *v* is velocity. (**a**) Trajectory of *x* changing with time; (**b**) Trajectory of *v* changing with time.

**Figure 6.** *Cont.*

**Figure 6.** Experiment 2 Group 2 simulation results;*x* is position, *v* is velocity. (**a**) Trajectory of *x* changing with time; (**b**) Trajectory of *v* changing with time.

According to experiment 2, system (14) satisfying Lemma 2 can achieve consensus when all *τ<sup>d</sup>* are less than *τ*max, and the system will diverge when all *τ<sup>d</sup>* are greater than *τ*max, which cannot achieve consensus; thus Theorem 2 is verified.

#### **5. Physical Experiment Verification**

To verify the proposed formation control algorithm, we did the experiment based on a pre-constructed multi-mobile robot research platform built by our laboratory, which was constructed with a self-designed three-wheeled omnidirectional robot carrying an UWB (Ultra-Wide Band) ranging module. The system is shown in Figure 7, the omnidirectional robot is shown in Figure 8, and the performance parameters are shown in Table 1. Because the consensus of the second-order system is analyzed in the theoretical analysis part, the speed and position are consistent, and while the omnidirectional robot is a fully driven robot, the horizontal and vertical directions can be controlled separately. Because the velocity control in a given direction is a second-order system, therefore, multi-omni-directional mobile robots can be decomposed into two one-dimensional second-order multi-robot systems. Therefore, omni-directional robots are used to verify the proposed algorithm. In the experiment, it is possible to determine whether the algorithm is valid based on whether the speed and the position of the robot after final stabilization are consistent. In the data acquisition part, the external positioning data of the robot are collected by the UWB positioning system built by myself, and the speed of the robot itself is collected by the encoder on the wheel of the robot and transmitted to the central processing computer via Wi-Fi for processing. The ranging error between the UWB ranging modules used in the experiment is 7 cm. Experiments were carried out in an indoor environment with length × width of 4 m × 5 m in order to verify the effectiveness of the proposed algorithm.



**Figure 7.** Multi-robot research platform.

**Figure 8.** Omnidirectional robots. (**a**) Top view; (**b**) Side view.

It should be pointed out that when the proposed algorithm is applied to a practical multi-robot system, it is necessary to first determine the maximum communication delay between robots and the maximum amplitude of the noise environment, and then design *k*<sup>1</sup> and *k*<sup>2</sup> based on this. At the same time, it should be noted that this experiment mainly focuses on verifying whether the system can achieve consensus under the control law. The collision avoidance behavior of the multi-robot system is not the emphasis in this research. Therefore, the collision avoidance algorithm program is written in the bottom control program of the robot in this experiment. When the robot is about to collide, the formation algorithm program will be interrupted, and the collision avoidance behavior will be executed. When a safe distance between the robots has been reached, the formation algorithm program will continue to be executed [17,18].

The communication topology used in the experiment is shown in Figure 4. The adjacent communication weight *aij* is 1, so *<sup>k</sup>*<sup>1</sup> <sup>∈</sup> (0, 10*k*<sup>2</sup> <sup>2</sup>). The central processor logs on each robot remotely through SSH, and obtains the communication delay between two robots whose communication weight A is not zero by PING command. The time delay between robots in the actual communication environment is time-varying, so take its maximum delay. We get *τa*<sup>1</sup> = 0.56*s*, *τa*<sup>2</sup> = 0.043*s*, *τa*<sup>3</sup> = 0.047*s*, *τa*<sup>4</sup> = 0.061*s*. The time taken for each robot to receive data and process them is *τb*<sup>1</sup> = 0.021*s*, *τb*<sup>2</sup> = 0.02*s*, *τb*<sup>3</sup> = 0.021*s*, *τb*<sup>4</sup> = 0.021*s*. Therefore, the time delay between robots is *τ*<sup>1</sup> = 0.077*s*,

*τ*<sup>2</sup> = 0.063*s*, *τ*<sup>3</sup> = 0.068*s*, *τ*<sup>4</sup> = 0.082*s*. Because this experiment is being performed in a laboratory environment, it is assumed that the communication noise is white noise with a maximum amplitude of 2. According to Formula (26) and the moving speed of omnidirectional robot, set *k*<sup>1</sup> = 1, *k*<sup>2</sup> = 1.4. Four omnidirectional robots were placed in arbitrary positions, (0.83,2.20,0), (0.35,1.74,0), (0.67,0.88,0), (0.56,0.48,0), respectively. In the practicality experiment, the robot cannot converge to one point, so Formula (2) is changed to:

$$\lim\_{t \to +\infty} \left[ \mathbf{x}\_i(t) - \mathbf{x}\_j(t) - F\_{\mathcal{P}} \right] = 0 \tag{35}$$

where *Fp* is the formation parameters and *p* = 1, 2, ... , *n*. Since the system only installs a UWB ranging sensor to provide positioning for the robot, the robot will perform pose determination before the experiment starts, and the robot's body coordinate system will be consistent with the global coordinate system. The experimental results are shown in Figure 9. The experimental video address can be found in reference [19].

**Figure 9.** Experiment process screenshots. (**a**) T = 0 s; (**b**) T = 15 s; (**c**) T = 25 s; (**d**) T = 30 s.

The experimental data collected by the UWB positioning system are shown in Figure 10.

The experiments show that the multi-robot system can eventually achieve consensus and form a formation in a variety of time-delay and noise environments, which verifies the effectiveness of the proposed algorithm.

**Figure 10.** Experiment 1 data of UWB positioning system.

#### **6. Conclusions**

Aiming at the consensus of multi-mobile robots under uncertain conditions such as communication delay, communication noise and measurement noise, we used the frequency domain analysis method, transformed the characteristic equation into the quadratic polynomial of the pure imaginary eigenvalue, and then obtained the conditions for achieving consensus under various time delay and noise conditions for a second-order multi-robot system. That is, when the time delays of all robots are less than the maximum time delays, the system can achieve consensus. In this paper, based on two aspects of system communication topology—directed graph and undirected graph—the results were verified by numerical simulation using MATLAB/Simulink, verifying the correctness of the theoretical derivation of the proposed algorithm. Finally, a multi-robot research platform was built, and formation control experiments were carried out in a real laboratory environment. The experimental results showed that the proposed algorithm could effectively make the second-order multi-mobile robot systems consistent. This paper only analyzes the consensus problem of second-order systems, while most existing multi-mobile robot systems are higher-order systems. Therefore, the consensus analysis of higher-order systems under noise and time delay conditions will be the focus of our next research.

**Author Contributions:** H.W. wrote this article and accomplished the algorithm analysis; Q.L. was the principle researcher on this project and provided many valuable suggestions; N.D. performed the simulation; G.W. established the physical experimental system; and B.L. performed the experiment and analyzed the experimental data.

**Funding:** This research received no external funding.

**Acknowledgments:** The work in this paper is supported by the National Natural Science Foundation of China (Grant No. 61663014).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Project Report* **A Multi-Agent Based Intelligent Training System for Unmanned Surface Vehicles**

#### **Wei Han 1,2,\*, Bing Zhang 2, Qianyi Wang 2, Jun Luo 1, Weizhi Ran <sup>3</sup> and Yang Xu <sup>3</sup>**


Received: 10 February 2019; Accepted: 6 March 2019; Published: 15 March 2019

**Abstract:** The modeling and design of multi-agent systems is imperative for applications in the evolving intelligence of unmanned systems. In this paper, we propose a multi-agent system design that is used to build a system for training a team of unmanned surface vehicles (USVs) where no historical data concerning the behavior is available. In this approach, agents are built as the physical controller of each USV and their cooperative decisions used for the USVs' group coordination. To make our multi-agent system intelligently coordinate USVs, we built a multi-agent-based learning system. First, an agent-based data collection platform is deployed to gather competition data from agents' observation for on-line learning tasks. Second, we design a genetic-based fuzzy rule training algorithm that is capable of optimizing agents' coordination decisions in an accumulated manner. The simulation results of this study demonstrate that our proposed training approach is feasible and able to converge to a stable action selection policy towards efficient multi-USVs' cooperative decision making.

**Keywords:** unmanned surface vehicles; multi-agent system; training system; genetic-based fuzzy rule learning; intelligent autonomous control; modeling and simulation

#### **1. Introduction**

The modeling and design of multi-agent systems for applications in the evolving intelligence of unmanned systems is interesting and promising [1–4], especially in situations where traditional methods can be costly, dangerous, or even impossible to realize. Several applications can be found in a very broad range of domains such as energy [5–7], security [8], robotics [9], and resource management [10]. A multi-agent system consists of autonomous agents that interact in an environment to achieve specific goals [11]. An autonomous agent, in this sense, is able to perceive its environment and perform actions using actuators.

Over the years, there have been several studies that have proposed principles for designing multi-agent systems, as well as approaches to coordinate the individual behavior of agents [12–16]. In most multi-agent application domains, a priori specification of the optimal agents' behavior is difficult due to the complexity and/or dynamics of the environment [11]. In such environments, it is natural for agents to adapt or learn optimal actions that maximize performance on-line. However, one of the key challenges is the need to build a simulation platform that can be used for fast training so as to gather enough training data to promote the intelligent development process.

In the context of multi-surface vehicles' modeling and design [17–19], an unmanned surface vehicles (USVs) system is one of the development trends of modern weapon equipment [20]. The unmanned surface vehicle system has become an important means of information confrontation, precision strikes, and special combat tasks in future war [21]. Over the years, unmanned surface vehicles have long been applied to varying kinds of military applications in the real world. Some of the application domains include anti-mine warfare, submarine warfare, reconnaissance, and surveillance [22–24]. Due to the open and complex dynamic combat environment, the USV combat system must develop in the direction of autonomy and synergy, and the future surface unmanned combat should have a strong autonomous capacity, be able to be controlled autonomously, and complete complex and diverse combat tasks independently or collaboratively in a complex uncertain environment [25]. However, human knowledge is difficult to apply directly to the coordination of USVs.

We propose a multi-agent-based intelligent training system for unmanned surface vehicles (USVs). We focus on the problem of training agents of a multi-agent-based unmanned surface vehicles system where no historic data concerning the agent behavior is available. Using the team learning framework, we provide approaches for learning the rule base for multi-agent systems, designing the learning environment for simulating cooperative and competitive agents' behavior, and gathering competition data from agents' observation for off-line learning tasks. To this end, this paper first presents a decentralized coordination platform and simulator design based on a multi-agent architecture. Secondly, a USV-based agent model is presented. The paper also proposes a data collection platform for agent learning and USV training. Lastly, a fast learning and training algorithm design based on the agents' rule base is presented.

To evaluate our approach, we used an island-conquering scenario where two teams of unmanned surface vehicles compete to conquer islands in an environment. We model this case study as a partially-observable stochastic game where one team has to learn the behavior that maximizes their returns against a human-controlled team. Our empirical evaluations show that our approach to learning the knowledge base of a multi-USV system, when applied to the trained team, was able to find a knowledge base (KB) to achieve better performance.

We first present the problem this study seeks to address in Section 2. Section 3 discusses the multi-surface vehicle training system design. Finally, in Section 4, we discuss our experiments and analysis of the simulation results followed by the conclusion of our study in Section 5.

#### **2. Multi-Agent-Based USVs' Training Problem**

We first discuss the multi-agent-based intelligent training system learning problem of this study.

In general, a stochastic game is an extension of the Markov decision process (MDP) to the multi-agent context. In such a game, agents may have conflicting goals, and their joint actions determine rewards and transition between states. Stochastic games usually assume agents as having a complete view of their states, whereas the partial observation case is discussed under the more general partially-observable stochastic games (POSGs) domain. Our study is performed under the POSG case.

**Theorem 1.** *A POSG is a tuple* < *X*, *U*1...*Un*,*O*1...*On*, *f* , *R*1...*Rn* > *where:*


In this study, we consider a team of multiple unmanned surface vehicles that must be trained to perform a given mission *<sup>m</sup>* in a mission space D ⊂ <sup>R</sup>*<sup>n</sup>* consisting of opposite forces. Let *<sup>H</sup>*(*m*) be the objective function dependent on the mission. Given that a team *T* consists of *N* USVs, the dynamics of the *i* th USV is given by:

$$\begin{aligned} \dot{x}\_i &= f\_i(x\_i(t), u\_i(t)) \\ \dot{x} &= 1, 2, 3..., N \end{aligned} \tag{1}$$

where *xi*(*t*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* and *ui*(*t*) <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* are the state and control inputs (available actions) of the USV, respectively, at time *t*. Depending on the USV model and mission type, a set of constraints (such as obstacle avoidance and the physical limitations of the USV) on the state *xi*(*t*) of a USV are imposed. The control inputs of a USV are bounded according to the limits of the USV's actuators |*ui*(*t*)| < *umax*.

The reward of an USV *i* after taking action *ui*,*<sup>t</sup>* ∈ *Ui* in stage *t* is denoted as *ri*,*t*+1. The individual policies *hi* : *X* × *Ui* → [0, 1] of the USVs in a team form the joint policy **h**. In the case of team learning, the joint policy is defined by the single learner. Since the reward of each USV in a team depends on the joint action, the respective USV rewards depend on the joint policy:

$$R\_i^{\mathbf{h}}(\mathbf{x}) = \mathbb{E}\left\{\sum\_{t=0}^{T} \gamma^t r\_{i,t+1} | \mathbf{x}\_0 = \mathbf{x}, \mathbf{h} \right\} \tag{2}$$

where *γ* is the discounting factor and *T* is the length of the horizon. The training objective for the intelligent training system is to find a policy that maximizes Equation (2) such that the trained team of USVs can outperform its rival USVs, while adapting to the changing dynamics of it's operating environment.

#### **3. Multi-Agent-Based Training System Design**

In this section, we discuss the multi-agent-based training system design of our study, as well as the ideas involved in training the agents. To start with, the section first presents the architecture of the multi-agent-based intelligent training system. Next, we present the USV agent model for cooperative training and competition. Finally, the section presents the agent data collection platform for gathering competition for the agent information sharing and learning algorithm.

#### *3.1. System Architecture*

The architecture of the system used in this study is presented in Figure 1. We decouple the agents' control from the main environment in order to allow different implementations of controllers. In this case, we consider the multi-agent simulation platform as a server, and any attached controller becomes a client. Each USV agent selects its action based on the observed information from the environment. Based on the data visualization model of the USV, the data collection platform forms and mounts the corresponding resource tree structure of the USV. Historical data are provided to the upper machine learning algorithm by the data collection platform after data fusion. The obtained algorithm controller is then loaded into the USV.

The multi-agent cooperative reasoner component serves as the policy used to select an action for a controlled agent using the agent's local observation. In this study, we model the reasoner as an FISdecomposed into a genetic fuzzy trees (GFT). Thus, the reasoner maintains a KB, which is used for the mapping fuzzy values of the observation of an agent to an action for execution in the POSG.

The learner component of the intelligent control model facilitates KB learning, tuning, or optimizing. It seeks to find the policy that exhibits the designed behavior of the controlled team as specified by the reward function. Thus, gradient-based approaches such as neural networks (NN) [26], as well as non-gradient-based approaches can be employed for this purpose. As already indicated in our usage of GFT for the reasoner component, we use the GA in this study to learn the fuzzy rules

and tune the MFsof a GFT. Each candidate KB is applied to the control problem for the whole of a simulation episode (we use round and episode interchangeably) as found in the Pittsburgh approach. Hence, the rewards that are received each time step are aggregated and assigned as the fitness value of the candidate KB used at the end of the round/episode.

**Figure 1.** The architecture of the multi-agent based intelligent training system.

#### *3.2. Agent Design*

The USV agent model that is used by the intelligent training system for cooperative training and competition is equipped with a radar and weapon system. The radar of a USV can detect a hostile threat within it's detection range. In this work, the weapon system of a USV consists of a calibrated gun with limited forward turning angle and firing range. A USV can turn in both directions (left or right) and with limited speed and turning radius *r*. In this study, only surge, sway, and yaw are used to describe the USV's movement at sea, as shown Figure 2). Therefore, the kinematic relationship between the USV position in the global inertial frame XYZ and the boat body-fixed frame xyz can be defined as:

$$\begin{aligned} \chi &= \iota \cos(\psi) - \upsilon \sin(\psi) \\ \chi &= \upsilon \sin(\psi) + \upsilon \cos(\psi) \\ \psi &= \lambda \end{aligned} \tag{3}$$

The location and orientation of USVs in the coordinates of the environment are represented by (*x*, *y*) and *ψ* in the Earth-fixed reference frame, while *u*, *v*, and *λ* represent the velocity of surge, sway, and yaw in the body-fixed reference frame, respectively.

**Figure 2.** Schematic diagram of the USV model and planar motion where *XEOEYE* is the Earth-fixed reference frame and *XBOBYB* denotes the body-fixed reference frame.

#### *3.3. Data Collection Model*

Agents' observation can persist in the course of training to enable off-line operations. Data collected during training can be useful for agents' cooperative decision making. Furthermore, the gathering of training data means that different agents' learning algorithm can be employed both on-line and off-line for agents' performance improvement. Moreover, the data can be organized to enable communication between agents. The gathered domain data (in resource forms) can be used for purposes such as improving agents' performance using state-of-the-art deep reinforcement learning (DRL) algorithms. Hence, the data collector component is not only necessary, but also can extend the scope of learning algorithms that can be employed for intelligent training.

The data collection model is designed using a training and control platform consisting of loosely-coupled components and a common resource model. One such paradigm for implementing modular services is the Future Airborne Capability Environment's (FACE (http://www.opengroup. org/face)) common operating environment approach. In Figure 3, we show the components of FACE with regards to our study. Their descriptions, in the context of our study, are as follows:


**Figure 3.** The architecture of the agent data collection platform based on Future Airborne Capability Environment (FACE) components.

#### *3.4. Multi-Agent-Based USV Training Algorithm Design*

The intelligent control model, as depicted in Figure 4, is responsible for performing learning and reasoning. It controls a team of agents in the simulation environment by sending agent actions to the environment every time step. Thus, at time step *t*, it sends the joint actions *ut* to the environment for execution where the action for agent *i*, *i* ≤ *n*, is *ui*,*<sup>t</sup>* ∈ *ut*. The intelligent control model consists of a learner and a reasoner that employs fuzzy logic and genetic algorithms (GA) known as genetic fuzzy systems (GFS) [27]. In this technique, GA is used to learn and tune the rule base and membership function of an FIS, respectively. To do this, an initial population of solutions, or strings, is created to encode the rule base and membership functions.

Classical approaches such as Michigan [28], Pittsburgh [29], and iterative rule learning [30] are mostly used to derive GFS fuzzy rules. We give an instance to illustrate how the rule base is encoded in this study using the Pittsburgh approach. Suppose *X*<sup>1</sup> and *X*<sup>2</sup> are input variables each with linguistic terms {*term*1, *term*2} and output variable *Y* with terms {*a*1, *a*2} with an arbitrary rule base of an FIS as follows:


Assume we assign codes zero and one to *a*<sup>1</sup> and *a*<sup>2</sup> respectively. The chromosome 0110 is obtained. This approach can represent an if-then rule in a single digit. This implies that each chromosome encodes possible outputs of a set of rules.

Similarly to tuning membership functions, each digit in the string corresponds to some endpoint of a membership function. GA can then be used to tune the parameters as part of the evolution process; such as found in [31], where initial parameters of a triangular MF T (*α*, *β*, *γ*) are tuned using:

$$
\mu\_{i+1} \leftarrow (\mu\_i + \delta\_i) - \eta\_i \tag{4}
$$

$$
\beta\_{i+1} \leftarrow (\beta\_i + \delta\_i) \tag{5}
$$

$$
\gamma\_{i+1} \leftarrow (\gamma\_i + \delta\_i) - \eta\_i \tag{6}
$$

where *δ* and *η* are the tuning coefficients, whereas *α*, *β* and *γ* parameterize T. *δ* shifts the MF to the right or left, whereas *η* shrinks or expands the MF. Therefore, every MF has two tuning parameters that can be optimized using GA.

**Figure 4.** The architecture of the multi-agent-based intelligent training system. KB, knowledge base. GFT, genetic fuzzy trees.

Although a GFS can be used to find KBs that optimize the fitness function, it is inefficient to use a single GFS for a complex control problem. In such a case, there is an increase in GA search space complexity, the tendency to have redundant rules, and KB size. Therefore, using genetic fuzzy trees (GFT) in [2] helps to mitigate this problem. A GFT is, essentially, an ensemble of GFSs arranged in a tree structure according to a logical and conditional sequence of execution. A GFT enables the decomposition of a complex fuzzy control system (FCS) or GFS into logical sections with each node in the tree focusing on an aspect of the control problem. Each GFS in a GFT defines its own KB. Hence, the KB of a GFT consisting of *<sup>m</sup>* <sup>∈</sup> <sup>R</sup> GFSs *<sup>G</sup>* <sup>=</sup> {*gfs*1, *gfs*2, ..., *gfsm*} may have the genetic structure (represented as a concatenated string) *gfsrb* <sup>1</sup> *gfsrb* <sup>2</sup> ...*gfsrb <sup>n</sup> gfsm f* <sup>1</sup> *gfsm f* <sup>2</sup> ...*gfsm f <sup>n</sup>* where *gfsrb <sup>i</sup>* and *gfsm f <sup>i</sup>* , *i* ≤ *m*, denote the RB and MF string/genome, respectively, of a given GFS. This structure can be seen in Figure 5.


**Figure 5.** The knowledge base genetic structure of a genetic fuzzy tree. Each chromosome or member of the GA population takes this form.

Algorithm 1 shows how beginning with an initial population, the KB of a GFT can be learned/optimized using the GA process. In Line 2, the initial population of the GFT KB is generated as the current population. This can be based on a predefined KB set or randomly generated. In Lines 3–4, the current population is passed on to the the FIS to be used for the control task. After all these KBs have taken turns performing the control task and have been evaluated, they are then subjected to the GA operators (Lines 6–17). This operation continues till the termination condition is reached. The resulting candidate KBs are returned as the best performing or most suitable of the KBs.

**Algorithm 1** Procedure: GFS procedure.

**Input:** GA hyperparameters **Output:** Best set of GFT KBs 1: Initialize *t* = 0 2: Generate initial population as P<sup>0</sup> 3: Set current population C*<sup>t</sup>* := P<sup>0</sup> 4: Run the fuzzy control system using C*<sup>t</sup>* 5: Evaluate the members of C*<sup>t</sup>* 6: **while** *Not termination condition* **do** 7: <sup>P</sup>*<sup>t</sup>* :<sup>=</sup> *selection*(C*t*) 8: <sup>O</sup>*<sup>t</sup>* :<sup>=</sup> *crossover*(P*t*) 9: C*temp* := *mutate*(O*t*) 10: **if** elitism **then** 11: C*temp* := *applyElitism*(C*temp*) //keeps a percentage of previous chromosomes 12: **end if** 13: *t* := *t* + 1 14: C*<sup>t</sup>* := C*temp* 15: Run the fuzzy control system using C*<sup>t</sup>* 16: Evaluate the members of C*<sup>t</sup>* 17: **end while** 18: Return best-performing candidate solutions

Non-stationarity is an inherent problem in the multiple learning agents' environment. In this section, we present an approach for incorporating agent-induced non-stationarity awareness into a GFT based on the framework of [32].

Since the root of the GFT has the strongest impact on the learning process or action selection, we consider it to be the main component for addressing this problem. Unlike the other GFSs in the GFT, which can be modeled to address specific aspects of a control problem (domain dependent), we consider the root GFS to be a special case.

We regard the sub-trees that are formed under the root GFS as the elements of the BR function co-domain. Thus, a BR function, in the GFT case, selects a sub-policy to be used for selecting an action for the agent. The influence function is also represented as an FIS (or GFS in our case, since GA is applied to learn the KB), which takes PGFsas input variables and a BF as output the variable. This is motivated by the idea that an influence function maps beliefs to possible best responses and can be designed to use deductive reasoning. These are properties that an FIS exhibits due to fuzzy logic. Thus, the root of the GFT takes incomplete/partial observations and produces its belief of another agent's policy. This connotes that the terms of the output variables then serve as possible opponent strategies as perceived by the reasoning agent.

In the case of the PGFs that serve as input variables to the GFT root FIS, the history of observations concerning the opponent or another agent is maintained and used by each PGF variable to produce an input value. This PGF input value can then be fuzzified and used in the inference process. The problem that arises is then how to design a PGF to capture the adaptation dynamics of another agent. In the work of [33], an agent *i* estimated the model of agent *j* as:

$$\mathcal{O}\_j^i(\boldsymbol{u}\_j) = \frac{\mathbf{C}\_j^i(\boldsymbol{u}\_j)}{\sum\_{\boldsymbol{a} \in \mathcal{U}\_j} \mathbf{C}\_j^i(\boldsymbol{u}\_j)} \tag{7}$$

where *C<sup>i</sup> j* (*uj*) is the frequency that agent *i* observed agent *j* taking action *uj*. Therefore, given the history of observations, Equation (7) can be extended to include the observation of agent *i* as:

$$\mathcal{O}\_{\dot{\jmath}}^{i}(o\_{i\prime}, u\_{\dot{\jmath}}) = \frac{\mathbb{C}\_{\dot{\jmath}}^{i}(o\_{i\prime}, u\_{\dot{\jmath}})}{\sum\_{a \in \mathcal{U}\_{\dot{\jmath}}} \mathbb{C}\_{\dot{\jmath}}^{i}(o\_{i\prime}, u\_{\dot{\jmath}})} \tag{8}$$

where *C<sup>i</sup> j* (*oi*, *uj*) is the frequency that agent *i* observed agent *j* taking action *uj* when its local observation was *oi*. *σ*ˆ*<sup>i</sup> j* (*oi*, *uj*) becomes the input value of agent *i*'s PGF for monitoring a specific action *uj* of agent *j*. With the input variables (PGFs) and output variable (BF) determined for the root FIS, GA can be used to find the best response for agent-induced non-stationarity and thereby stabilize learning in the POSG.

#### **4. Multi-Agent-Based Simulator Design**

The simulator used for the experiment was developed using the Java programming language. The fuzzy control library [34] was extended for the design and encoding of fuzzy rules for learning purposes as implemented by the intelligent control model of our architecture. In this section, we present the experiment conducted for this study and follow it up with the analysis of the simulation results.

#### *4.1. Environment Setup*

To evaluate the effectiveness of the proposed learning system, a scenario of multiple boats competing for conquering more islands while engaging in combat was adopted. In this scenario, the environment set in a maritime setting consisted of *N* islands and two teams of unmanned surface vehicles (boats) referred to as blue and red forces. The boats were equipped with radar and guns for detecting and shooting enemies, respectively. The guns were set to have a fixed left-to-right traversal angle and shooting range, as shown in Figure 6. Since both teams had conflicting goals, a team achieved its goal by contending with opponent boats. Each team had information of the location and number of islands and their states, whether conquered or unconquered by the team. An island is said to be conquered by a team if a member of the team moves to the coordinate of the island and stays there for that time step or no opponent boats move to that particular island conquered by the team. If two opponent boats occupy an island at the same time, the island is not awarded to any team for the elapsed time steps. For our experiment, we implemented two controllers for both teams. The controller for the blue force was made to use fixed rules provided by humans, whereas the red force controller, which we sought to train, had to learn the best rules that maximized their performance. At the beginning of each time step, the simulation environment received a batch of commands from the (ally and enemy) controllers of both teams and updated the environment.

**Figure 6.** A snapshot of the island-conquering scenario. In the middle are the 5 islands to be conquered by both teams. The circles around the boats are their detection ranges, and the forward looking arc is the the firing regions of the boats.

The possible actions of each agent are also explained below:


#### *4.2. Competition Objective*

The goal of each team was to conquer all islands and destroy opponents, while staying alive. The performance of each team was evaluated after each time step using the function:

$$R\_t^f = A\_t^f \times p + I\_t^f - D\_t^{f'} \times p \tag{9}$$

where *A<sup>f</sup> <sup>t</sup>* is the destruction suffered by the opponent team *f* as a result of attacks from team *f* in time step *t*, *I f <sup>t</sup>* the conquered island points received by team *<sup>f</sup>* , *<sup>D</sup><sup>f</sup> <sup>t</sup>* the damaged caused by the opponent *f* to the team being assessed *f* , and *p* the points awarded for boat attacks/destruction. The winner of an encounter is decided after the end of the episode. The team with the highest score is declared a winner of that encounter.

#### *4.3. Data Collector for Training and Learning*

As mentioned earlier, gathering of training data is useful in a number ways. The cooperation of agents can be enhanced when agents share their observation. Furthermore, data gathered during training can be used later by other learning algorithms such as deep learning methods for analysis and performance improvement. The resource model shown in Figure 7 is used for on-line data gathering during training. The data that are sent to a controller as feedback take this form. The main components are as follows:


**Figure 7.** The simulation feedback resource structure of the island-conquering case study.

#### *4.4. Training and Learning Algorithm*

In order to provide a realistic virtual environment and opponent for the red force to train against, the blue force control algorithm uses human-defined rules. Table 1 shows an example encoding rules used for the blue force task selection during simulation. The GFT of the red force control algorithm is illustrated in Figure 8. The details of each of the GFSs of the GFT can also be seen in Table 2.

**Table 1.** Example encoded task selection rules used by the enemy team.


**Figure 8.** The genetic fuzzy tree structure used for training the ally team. The rectangles are FISs, whiles the circles represent predefined actions in the simulation.


**Table 2.** Details of the GFSsthat constitute the GFT used for training the ally team.

The Assignment GFS was designed to consider opponent-induced non-stationarity with the following PGFs:


The linguistic terms of each PGF are low, moderate, and high. Each PGF first computes the action probabilities for each detected agent. A softmax is then computed over the reported action probabilities, and the maximum is selected as the PGF input value. The belief function output variable selects the perceived opponent strategy in each IF-THEN rule. The other input variables are:


Triangular MFs are used to define the semantics of the fuzzy rules. The initial MF tuning parameters were sampled uniformly from [−1.5, 1.5] for each input variable MF. The GA parameters we set for training the ally team controller are presented in Table 3.


**Table 3.** The GA parameters used for training the ally team.

As indicated in Table 3, we combined a mutation probability schedule with an adaptive non-uniform mutation to control exploration and exploitation. The ally team learner was set to maintain the last 5 observations of each detected enemy boat.

#### *4.5. Experiment Results and Validation Analysis*

We have evaluated our intelligent training system using four distinct scenarios of the island conquering case study. The parameters used in Scenario 1 for both team are shown in Table 4. Furthermore, in Scenario 1, sinking an opponent and conquering islands were both worth two points each. In Scenario 2, the number of USVs in a team was reduced to three, and sinking an opponent point was reduced to one. The simulation for Scenarios 1 and 2 was run concurrently on two different machines for 2020 from 20 randomly-generated chromosomes as the initial rules for the blue force against the enemy team, which used a human-sampled rules sets. Figure 9 shows the snapshots of the competition as simulation progresses.

If the blue force began an episode knowing exactly what to do in the environment and the red force was yet to learn, then the blue force should perform better than the red force in the initial stages of the simulation. However, as the simulation progresses, the performance of the red force should be improving since the red force learns, while the blue force uses the static rule. Furthermore, if the blue blue force is using a different set of rules even though a static set of rules, then there is a tendency for a dynamic rise and fall in the performance measure by the strength of a simulating rule set.


**Table 4.** The fixed simulation parameters.

**Figure 9.** Illustration of the best-performing rules as training progresses. At the initial stages, the blue team had total control over the red team. However, during the final stages of the training, the performance of the red surpasses that of the blue.

The observed simulation results of both teams during training are presented in Figure 10. As can be seen in Figure 10a,c, the blue force was performing better than the red force during the first generation of rules. However, after 10 and 13 generations, respectively, the red force began to outperform the blue force in the environment, and this is what caused the red force performance to rise in Figure 10a,c. In Figure 10b,d, we compare the highest score attained by both teams in each generation. Furthermore, the same analogy can be drawn from this graph, as in the the initial generations, the blue force seemed to be scoring high as compared to the counterpart red force.

(**a**) Performance of teams during training of six boats.

(**c**) team size = 3 boats; sinking an enemy = 1 pts; island = 2 pt.

(**e**) Team size = 10, firing range, detection range, and firing angle = 80% of that of the enemy, respectively

(**b**) Best performance of teams per generations of rules of six boats

(**d**) Generational performance of three boats

(**f**) High score per generation of 10 boats

**Figure 10.** *Cont*.

enemy firing range

**Figure 10.** Illustration of the performance of the red and blue teams over 2020 (100 generations) episodes during training. Both series of the episodic performances are averages over 20 different sequences of episodes.

In the third simulation, the capabilities of the red force was constrained below that of its counterpart blue force. With each team consisting of 10 boats, the firing range, detection range, and firing radius of the ally team was set to 80% of the opponent team, respectively. Furthermore, we set the initial population of rules to that of the rule base of Simulation 1. Figure 10e,f shows the performance graphs of both teams.

Finally, a simulation of 15 boats with the objective of destroying more opponent boats while still countering islands was conducted. To this end, we assigned three points for destroying opponent boats and a point for conquering an island. All other parameters were reset to the initial fixed parameters with the exception of the firing range, which was maintained as 80% of the opponents' firing range. The initial population of rules in this scenario was the final population rules' set from the scenario. The performance graph is shown in Figure 10g,h.

The observed performance of these KBs run over one hundred episodes against the enemy team is presented in Figure 11. The blue line in Figure 11a represents the best performance of the enemy team in the three cases against the ally team's performances in all cases. In Figure 11b, the average scores over 100 episodes with increasing number of boats in a team is illustrated. The blue bar is the average scores of the the enemy teams, while the rest of the bars represent the performance of the ally team with varied team size and point allocation.

Whilst the blue force's performance degraded, the red force's reasoner was able to maintain a set of KBs that achieved relatively high performance for most parts of the entire simulation of all training scenarios. Comparing the corresponding episode scores of the final KB of the red force against the blue force rule base used during training revealed that the trained red force was better in at least 82% of all cases, as can be seen in Figure 11. The simulation results demonstrate the feasibility of our approach for multi-agent KB learning and the ability to converge to a stable action selection.

**Figure 11.** Performance validation of the best-performing rules of the red force after training. Blue represents the highest performance of the blue force. The red is the performance when two points are assigned to destroying enemy boats and a point for conquering an island. The green is the scores obtained when the value of an island is two points and the destroying opponent boat fetches a point; while the light blue is the case when both conquering an island and destroying the opponent is worth a point.

#### **5. Conclusions and Future Work**

In this study, we have presented an unmanned multi-surface vehicle training approach for complex control. We used team learning where a central learner controls the agents in an environment modeled as a partially-observable stochastic game. We modeled the control problem as a decomposed FIS and have provided practical ways for constructing and learning the FIS KB. Our proposed framework enables the usage of different FIS decompositions for a complex control problem with minimal or no modification to the reasoner implementation. It also enables the incorporation of agent-induced non-stationarity awareness in the learning process and a resource model for gathering agents' local observations for off-line learning tasks. Our contributions enable multi-agent control to be performed in domains where no historic data are unavailable for training, but the desired system behavior can be specified as a function of the agents' performance. The multi-surface vehicle island-conquering case study demonstrates the feasibility and convergence properties of this study. Our future works will focus on improving the performance of our approach using multi-agent deep reinforcement learning techniques with the gathered agents' observations.

**Author Contributions:** Conceptualization, B.Z.; Investigation, W.H.; Methodology, J.L.; Software, W.R.; Writing—original draft, Q.W. and Y.X.; Writing—review & editing, J.L.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Applied Sciences* Editorial Office E-mail: applsci@mdpi.com www.mdpi.com/journal/applsci

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18