**Multi-Agent Systems 2019**

Editors

**Andrea Omicini Stefano Mariani**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Andrea Omicini Universita di Bologna ` Italy

Stefano Mariani Universit`a degli Studi di Modena e Reggio Emilia Italy

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Applied Sciences* (ISSN 2076-3417) (available at: https://www.mdpi.com/journal/applsci/special issues/Multi-Agent Systems 2019).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Article Number*, Page Range.

**ISBN 978-3-03943-046-8 (Hbk) ISBN 978-3-03943-047-5 (PDF)**

c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**



## **About the Editors**

**Andrea Omicini** is Full Professor at DISI, the Department of Computer Science and Engineering of the Alma Mater Studiorum–Universita di Bologna, Italy. He holds a PhD in computer and ` electronic engineering, and his main research interests include coordination models, multi-agent systems, intelligent systems, programming languages, autonomous systems, middleware, simulation, software engineering, pervasive systems, and self-organization. He has published over 300 articles on these subjects, in addition to having edited numerous international books, serving as Guest Editor of several Special Issues of international journals, and has held many invited talks and lectures at international conferences and schools.

**Stefano Mariani** received his PhD degree in Computer Science from the University of Bologna, Bologna, Italy, in 2016. He is currently Assistant Professor of Computer Science with the University of Modena and Reggio Emilia, Reggio Emilia, Italy. He has been involved in the EU FP7 Project SAPERE, and in EU H2020 Project CONNECARE. His current research interests include coordination models and languages, agent-oriented technologies, pervasive computing, self-organization mechanisms, and socio-technical systems.

## *Editorial* **Special Issue "Multi-Agent Systems": Editorial**

#### **Stefano Mariani 1,\*,† and Andrea Omicini 2,†**


Received: 16 July 2020; Accepted: 25 July 2020; Published: 1 August 2020

**Abstract:** Multi-agent systems (MAS) are built around the central notions of agents, interaction, and environment. Agents are autonomous computational entities able to pro-actively pursue goals, and re-actively adapt to environment change. In doing so, they leverage on their social and situated capabilities: interacting with peers, and perceiving/acting on the environment. The relevance of MAS is steadily growing as they are extensively and increasingly used to model, simulate, and build heterogeneous systems across many different application scenarios and business domains, ranging from logistics to social sciences, from robotics to supply chain, and more. The reason behind such a widespread and diverse adoption lies in MAS great expressive power in modeling and actually supporting operational execution of a variety of systems demanding decentralized computations, reasoning skills, and adaptiveness to change, which are a perfect fit for MAS central notions introduced above. This special issue gathers 11 contributions sampling the many diverse advancements that are currently ongoing in the MAS field.

**Keywords:** multi-agent systems; agent-based modeling; agent-based simulation; decision support

#### **1. Introduction**

As intelligent systems pervade more and more our everyday life, the need for a coherent set of abstractions and technical tools to support their design, development, and maintenance keeps growing steadily. Multi-agent systems (MAS) nowadays represent the richest and most reliable source for such abstractions, given that they provide the components (the agents) to encapsulate essential features such as cognition and autonomy, as well as the notions required to put systems together (agent societies) and make them work in the real world (MAS environment) [1]. In addition, a few decades of intensive academic and industrial research on MAS, and their integration with the most recent advances in AI techniques and IoT technologies, have promoted the intense development and widespread diffusion of novel agent-oriented techniques, methods, and tools, and paved the way towards the acceptance of MAS as the forthcoming industrial mainstream for complex yet reliable intelligent systems.

Yet, the articulation of the MAS scenario is nowadays so overwhelming that the transition is going to make both researchers and practitioners busy for two more decades, at least—before all aspects and issues concerning MAS techniques and methods are fully understood and addressed within the many relevant application scenarios where MAS are required to operate. Providing a platform where MAS researchers can share their most novel and exciting findings and results is then crucial to support and promote the development and spreading of new MAS models and technologies: this is in fact the main motivation behind the special issue.

#### **2. Overview**

Before delving into the individual contributions gathered, a few general statistics and observations are useful to have an overview of the content and outreach of this special issue:


These numbers are in line with previous edition of the special issue [2], except for a lower acceptance rate, which reflects the more selective review process meant to increase the quality of the special issue and its potential impact on research and practice.

Figure 1 shows the wordcloud generated from the full text of the published papers.

**Figure 1.** Wordcloud generated from the full text of each publication of the special issue.

The most mentioned words are "agent" and "model", closely followed by "simulation" and "system", and then by "task" and "data". The former four words are not surprising and confirm our editorial for previous edition: MAS are well-known and widely adopted also outside the strict boundaries of computer science and engineering exactly for the purpose of modeling and simulating complex systems, in fields as diverse as bioinformatics, social sciences, network science, supply chain, and logistics. The latter two words may instead appear as rather novel, and point to the increasingly widespread exploitation of MAS for novel purposes, such as collecting, managing, and analyzing data to turn it into actionable knowledge, and support execution of tasks requiring peculiar capabilities such as reasoning, reactiveness to environmental conditions, compliance to complex inter-dependencies.

Other highly relevant words working as clues for relevant application areas and kind of systems are the following ones:


In our previous editorial we analyzed a similar wordcloud from the perspective of the topics that were subject of the publications, which were: agent-based modeling and simulation, situated systems, socio-technical systems, and semantic technologies. Except for the latter, relevance of the other is confirmed by the current edition. That said, this year we would like to take a different point of view, by answering the following question: what are MAS used for? In the following sections we classify the papers included in this special issue according to the following four usage destinations, and summarize their main contributions:

*Decision support*—papers gathered in this category exploit MAS, in particular their ability to perform distributed reasoning, to deliver insights about a certain topic, with the goal of enhancing humans' decision making processes and lower their cognitive overhead.

*Modeling framework and methodology*—in this category, what matters the most is the expressive power of the agent abstraction as a conceptual tool supporting engineering of complex systems featuring autonomous components.

*Programming abstraction and simulation framework*—complementary to the previous category, in this one MAS are mostly used for their operational features, as a software tool enabling development and execution of the complex systems already mentioned, especially in simulated scenarios.

*Execution infrastructure*—here, MAS are used as the backbone infrastructure executing the computations demanded by the application at hand, leveraging MAS themselves as an efficient and effective distributed computing platform.

#### **3. MAS for Decision Support**

Interestingly enough, the category reflecting the new entry w.r.t. previous editorial is also the most represented: 4 papers out of the 11 published exploit MAS to deliver decision support.

In [3], the authors exploit agent-based modeling and simulation to define a photovoltaic adoption prediction model based on self-reported behavior, then refined by a genetic algorithm looking at observed data. The goal is to help energy-related decision making by policy makers, by modeling and predicting households pondering whether to adopt photovoltaic energy solutions. Here, the agent abstraction is useful to model individual behavior driven by rational utility functions (such as economic savings), and the social dimension stemming from neighborhoods influencing each others' decisions.

In [4], an MAS is used as the operational backbone of a game-theoretic approach to task allocation under strict spatio-temporal constraints, applicable to deliver decision support in many critical scenarios such as disaster relief. Here the main motivation behind usage of an MAS lies in the preference of quickness over optimality as regards convergence to useful allocations, as the targeted scenarios do not mind optimal solutions if they do not come within a reasonable time. As such, an MAS is built to run a scheduling algorithm rooted in game theory in a decentralized fashion, improving convergence time while giving away optimality.

In [5], an MAS is proposed as a platform for instrumenting a collective of neural network based classifiers by adopting a crowdsourcing metaphor: each classifier is an agent, each classification is an opinion, and the overall prediction delivered by the system is the aggregation of the crowd's opinions. The goal is to improve prediction accuracy and transparency, by letting agents interact socially to exchange knowledge (e.g., new features), gain reciprocal trust, and change opinion when given enough evidence. The agent abstraction is then used mostly for its autonomy, and the MAS as an enabler of the sociality needed to improve transparency and accuracy through the exchange of information.

In [6], the authors target the green coffee supply chain with an agent-based decision support system devoted to planning production scheduling in face of fluctuating and peak demand. The modeled supply chain is rather complex, with plenty of interdependencies amongst activities and variables influencing the decision process at each step. An MAS is thus used to tame this complexity,

by modeling all the different tasks and processes as autonomous agents, each undergoing its own reasoning to take decisions, while interacting with others upon need.

In spite of the heterogeneity of the application domains and the techniques adopted, all the described approaches leverage on MAS central notions to improve delivering of decision support functionalities, either by simulation [3,4] or as an operational platform [5,6].

#### **4. Agent-Based Modeling and Methodology**

As witnessed by the following papers, modeling complex systems of any sort within heterogeneous scientific disciplines is a staple in MAS application, either for observing such systems to devise out properties, patterns, and laws, or for crafting them in compliance with agent-oriented methodologies so as to obtain MAS non-functional properties—decentralization, reactiveness to change, etc.

In [7], an MAS is used in the context of social sciences to model residents in a smart city so as to study their social engagement during time (e.g., daylife vs. nightlife) and across space (city center vs. business district). The idea behind such a modeling is that activity of the residents are influenced by what others are doing and by environmental conditions, such as the presence of shops, events, etc., hence the social and situated nature of agents in an MAS is a perfect fit. Based on this modeling, the authors study various aspects of social and institutional engagement, such as mutual trust and trust in institutions.

The application context of [8] is instead totally different, as it deals with observation of emergent properties, in particular scale-free features, for robotic systems implementing swarming behaviors, such as collective foraging. The authors aim at testing whether scale-free attributes may also arise in artificial collective systems inspired to biological ones, such as ant colonies, and then whether such attributes have positive influence on the overall system performance. In such a context, the agent abstraction is particularly useful while modeling individual behavior of robots, which depends on environmental conditions (situatedness) and peers actions (sociality).

In [9], we are introduced to yet another research field exploiting the expressive power of the agent abstraction for modeling, while also considering a methodological perspective: psychology, in particular, educational games design. The authors describe a design process for educational games which heavily relies on the agent abstraction for modeling both human behavior and the software system engaging players, for instance, the admissible actions at each stage of the game, their effect on the system or the player(s), and the modalities of interaction between players and with the software control system. To further consolidate the agent metaphor, the authors also consider virtual avatars representing players and system characters, so as to leverage on more natural interactions.

#### **5. Agent-Oriented Programming and Simulation**

Complementing the modeling aspect discussed in previous section, the two following works exploit agent-oriented programming to deliver software tools enabling design and deployment of MAS and agent-based simulations.

In [10], a model-driven approach is proposed to reconcile all the different existing organizational models meant to let MAS designers operationally define the social dimension in an MAS. Organizational models respond to the need of guaranteeing correctness of the overall MAS behavior despite individual agents are autonomous entities, hence, as such, are able to choose their own course of actions in isolation—and while pursuing their own individual goals. Through these models and the corresponding software tools, MAS designers have ways of specifying co-operation protocols amongst agents, taming their individual behaviors and steering them towards a coherent system-level goal.

In [11], the focus of the contribution and the main novelty regard seamless deployment on simulated and production environments, with little modifications as possible to the agent logic. The authors propose a coherent and integrated Python development framework encompassing testing, simulation, validation, and deployment software production stages, as well as autonomy,

reactiveness to environment events, and social ability facets of an MAS. The proposed framework, ARPS, revolves around a few crucial architectural components: the agent manager, agents themselves, a discovery service, and a discrete events simulator. Facilities for dealing with sensing and actuating in either simulated or physical environments are made available, and agent behavior as well as social interactions can be defined through policies dictating which actions correspond to which event.

Both the aforementioned contributions aim at providing general-purpose agent-based solutions to let other developers build their own MAS.

#### **6. MAS as Execution Infrastructure**

The last usage destination—that is, exploiting an MAS as the execution infrastructure for a given system—is quite common in MAS literature, as the agent abstraction is a general-purpose programming concept with applications in many business domains and for heterogeneous systems.

In [12], an MAS is used in the context of multi-robot formation: first, a distributed consensus algorithm is simulated on a multi-agent based simulation software to assess desired properties despite uncertainty of data and delay in communications, then such algorithm is implemented as an MAS and deployed on a real robotic platform comprising four mobile robots, further assessing effectiveness. In this work, the value added of the MAS lies in its natural predisposition to distribution and tight coupling with environment sensing and actuation, which are necessary features of multi-robot systems.

In [13], instead, an MAS is used as the platform for training unmanned surface vehicles: agents in the MAS correspond to vehicles' controllers and implement a distributed learning algorithm meant to achieve optimal coordinated behavior. Here, the agent abstraction is chosen for its capability to express adaptive behavior by learning new behavioral rules (likewise plans in BDI architectures) while operating.

Both contributions showcase the ability of MAS architectures to provide a suitable infrastructure for effective and efficient execution of heterogeneous tasks (consensus in the former, learning in the latter).

#### **7. Conclusions**

The large number of submissions to this second installment of the MAS special issue has made it clear that there is still a huge space that initiatives of this sort can help covering. In addition, the quality of the papers collected and published here testifies the effort that the scientific community is devoting to the development of novel MAS models, techniques, and methods. The breadth of the MAS-related topics faced by submitted papers (which for obvious reasons cannot be fully analyzed here) also witness the increasingly expanding reach of agent-based techniques and solutions.

This is mostly why this special issue on the one hand provides readers with a very representative picture of the state-of-the-art of MAS research, on the other hand is far from being conclusive under any possible viewpoint. The articulation and expansion of the MAS field leave the space open for many other initiatives like this special issue—so we expect to see many more of them in the next few years.

In the meanwhile, we are quite confident that the readers of Applied Intelligence will be able to understand the extent of the application scenarios that MAS are going to cover in the next decades, as they become the conceptual and technical foundation for the next generation of complex intelligent systems.

**Author Contributions:** Conceptualization, S.M. and A.O.; methodology, S.M.; software, S.M.; validation, A.O.; writing–original draft preparation, S.M.; writing–review and editing, A.O.; visualization, S.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The guest editors would like to thank the Applied Sciences Editorial Office, in particular the reference contact Daria Shi, for the extreme efficiency and attention devoted to the handling of papers, from submission to publication, through the peer review process. We would also like to thank the many reviewers participating in the selection process (3 to 4 on average) for their valuable constructive criticism, often appreciated by the authors themselves. Last but not least, our gratitude goes to the authors who submitted their papers, and to the many readers who already generated citations and downloads.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Merging Observed and Self-Reported Behaviour in Agent-Based Simulation: A Case Study on Photovoltaic Adoption**

#### **Andrea Borghesi \* and Michela Milano**

DISI, University of Bologna, 40136 Bologna, Italy; michela.milano@unibo.it **\*** Correspondence: andrea.borghesi3@unibo.it

Received: 2 April 2019; Accepted: 15 May 2019; Published: 22 May 2019

**Abstract:** Designing and evaluating energy policies is a difficult challenge because the energy sector is a complex system that cannot be adequately understood without using models merging economic, social and individual perspectives. Appropriate models allow policy makers to assess the impact of policy measures, satisfy strategic objectives and develop sustainable policies. Often the implementation of a policy cannot be directly enforced by governments, but falls back to many stakeholders, such as private citizens and enterprises. We propose to integrate two basic cornerstones to devise realistic models: the self-reported behaviour, derived from surveys, and the observed behaviour, from historical data. The self-reported behaviour enables the identification of drivers and barriers pushing or limiting people in their decision making process, while the observed behaviour is used to tune these drivers/barriers in a model. We test our methodology on a case-study: the adoption of photovoltaic panels among private citizens in the Emilia–Romagna region, Italy. We propose an agent-based model devised using self-reported data and then empirically tuned using historical data. The results reveal that our model can predict with great accuracy the photovoltaic (PV) adoption rate and thus support the energy policy-making process.

**Keywords:** simulation model; multi-agent systems; photovoltaic energy; parameter fine-tuning; self-reported behaviour; predictive model

#### **1. Introduction**

The European Union is deeply committed to curtailing its greenhouse gas emissions by at least 20% by 2020, w.r.t. 1990 levels, as stated in the sustainable growth strategy outlined in [1]. The path to achieve such a goal passes through an increase up to 20% of the share of renewable energy sources in final energy consumption and a 20% rise in energy efficiency. All EU members and regions should put an effort in this direction to contribute to these common objectives. For instance, Italy was supposed to reach a 17% share of final energy coming from renewable sources in 2014, a target that have been reached and slightly surpassed [2].

The complex task of enforcing these guidelines is shouldered by national and regional policy makers. Energy policies have a strong impact on sustainable development and they influence economy, society and environment. Policy makers have to devise plans targeting strategic objectives, e.g., cutting greenhouse emissions, with the goal of satisfying different constraints (i.e., limiting pollutant emissions, not exceeding a financial budget, etc.) and respect the EU guidelines. After having been devised, the plans need to be enforced with implementation instruments (from incentives to investment grants, passing through tax exemptions) [3–5]. One aspect that tends to be severely underestimated while planning energy policies is the strong influence of human behaviour together with social dynamics; it is often studied with the assumption that consumers are rational and guided only by financial and

economic drivers [6–8] which severely affect the accuracy and realism of the study. In fact, the decision process of involved agents (i.e., private citizens) is deeply influenced by non-economical motivations, such as social influence, peer pressure, bandwagon effects, lack or wealth of knowledge, risk aversion, etc. [9–12]. Properly understanding the decision making process is critical to better influence the interested parties' behaviour and steering them toward good practices and policy objectives. In this context there is urgent need of appropriate and accurate models for enabling policy makers to design, evaluate and implement energy policies to satisfy strategic objectives and develop sustainable strategies that have a strong impact on economy, society and environment.

We propose to merge in the model definition two types of knowledge: (1) self-reported behaviour derived from large scale surveys and interviews, and (2) observed behaviour based on real data measuring the actual effect of the target energy policy. The models are used to bridge the gap between these two behaviours and enable a better understanding of private citizens decision-making processes. We claim that social and economic drivers and barriers can be extracted from quantitative analysis of survey data, whilst a deeper understanding of how these drivers operate and interact can be derived from interview findings. On the basis of these drivers and barriers, we build a parametric model, whose parameters can be empirically tuned so that the model reproduces the observed behaviour. We expect the parameter tuning to generate different outputs (i.e., parameter values for different drivers and barriers) for different private entities classes (private citizens, enterprises, etc.) and for different countries and geographical situations.

The final outcome of merging self-reported and observed behaviour is the creation of predictive models with the ability to forecast the stakeholder behaviour in the presence of specific energy policies, financial and economic situations. These predictive models can be inserted into simulations and used by policy makers in a what-if fashion, namely by proposing alternative scenarios and observing the emerging behaviour of consumers related to energy efficiency and overall cost.

In this work we focus on policies for promoting energy production from renewable energy sources and, in particular on photovoltaic (referred to as PV) power generation. We use, as a case study, self-reported and observed behaviour in the Italian region of Emilia–Romagna, where the majority of the total installed photovoltaic power is generated by small/medium panels installed by private citizens and enterprises. For this reason the regional policy makers cannot directly decide the total power installed, but they have to push the PV power generation through indirect means, usually in the form of incentives to the PV energy. The decision to install a PV panel is not driven exclusively by economical/technical considerations (although these aspects have clearly a significant impact), but it involves also different factors determined by the human behaviour and social interactions [13,14]. As observed behaviour, we employ the data regarding the historical yearly installation rate of new photovoltaic panels and the total amount of installed photovoltaic power reported by the national and regional governments. On these data we craft an agent-based model for simulating the adoption of photovoltaic panels. We consider individual households as the actors populating the simulation environment and deciding whether to install a PV panel or not. The behavioural rules of the agents are devised using self-reported data collected thanks to surveys and questionnaires conducted among private citizens. From these data we derive the drivers and the barriers that influence the adoption of a PV panel. The importance of each factor is decided during the following phase, when we use the observed data in order to fine-tune the parameters of the model. The model takes into account both geographical, economical and social aspects.

The validation and final evaluation of the proposed model has been performed over a period of 11 years by comparing the historical PV power installation trend in a certain period to the one generated by the agent-based simulators. The historical data collected over this time span is divided in two sub sets in order to achieve a two-fold purpose: (I) tune the agent-based model's parameters (combination of self-reported and observed behaviour) and (II) test the accuracy of the approach by assessing its predictive capacity. For this purpose, the first seven years were used for parameter tuning and for the remaining years we compare the historical data with the simulated behavior—a small

discrepancy would mean a good accuracy of the model, otherwise, a large gap would indicate a model not really usable. The experimental results highlight that with our model it was possible to predict future trend of installed PV power; this information and predictive capability can greatly help policy makers in their task.

The structure of the paper is the following. In Section 2 related works are discussed. Section 3 provides a general overview of the proposed approach. Section 4 presents the surveys used to identify drivers and barriers governing people's decisions and the method to derive the model for the agents' behaviour. Then, Section 5 describes the proposed agent-based model. The method used for tuning the model's parameters is described in Section 6. Section 7 reports the evaluation of the proposed approach, validating the fine-tuned agent-based model and assessing its accuracy. Finally, Section 8 concludes the paper, summarizing the obtained results and suggesting future research directions.

#### **2. Related Work**

The adoption of renewable energy sources, such as photovoltaic panels, can be framed as an innovation diffusion problem, an issue that has been the subject of many research works. Several findings suggest that the diffusion of an innovation is a social process. A common methodology to deal with this problem is agent-based modeling and simulation, where the agents are connected to form a interconnected network; agent-based models are also referred in the literature (and in the rest of this paper) as ABMs. Agent-based modeling is a computational approach that provides a tool for researcher with the purpose of creating, analysing and experimenting with models composed of agents that interact within an environment. Agent-based models are a simplified representation of the reality that can be used to explore certain aspects that would be harder to study without the aid of computational experiments [15]. Agents are usually distinct parts of a program that are used to represent social actors/individuals, organizations such as firms and enterprises, or bodies such as nation-states. They are programmed to react to the computational environment where they reside; this "simulated" environment is a representation of the real environment where the social actors operate [16].

In particular, ABMs have been used to study how innovative technologies spread in the real world [17–20]. It has been noted that the adoption rate of innovation does not depend exclusively on economic factors (i.e., costs or available budget), but many other aspects can have a profound influence. For instance, Abrahamson et al. [21] describe a threshold ABM where the adoption rate of a new technology is influenced by the "bandwagon effect", with new adopters facilitating the spread of knowledge that in turn increases the adoption of the innovative technology by new agents. Similarly, Chatterjee et al. [22] consider that potential adopters can have precise information about the cost of a innovative technology but can only estimate its benefits and real value—hence the perceived worthiness is an important factor. The main idea is that the information about an innovative technology spreads among an increasing network of agent through communication with previous adopters—in this way the uncertainty about the innovation potential decreases. The PV technology diffusion can be cast as a problem of innovation adoption, hence these insights will be partially incorporated in the model proposed in this paper.

Extensive research has been devoted to investigating the PV technology via ABMs [23–26], with a special focus on rooftop PV panels—systems that are typically installed by private citizens and small enterprises. The rest of this section reviews some of the approaches proposed in the last few years.

Zhao et al. [27] describe an ABM for studying the diffusion of PV systems where agents are homeowners which decide whether to install a PV panel or not. They consider four main factors that affect the agents' decision: payback period, household income, neighbourhood and advertisement. The final decision of each agent is based on a linear combination of these four factors, called "desire level". If the desire level is above a certain threshold, the household will opt for installing a PV panel. Selecting the correct value for the threshold is not an easy task: it strongly impacts the decision algorithm of the agents but the authors do not offer a general method to compute it. Instead, the domain

experts' knowledge is used to select a set of realistic values, which have to be tested and validated on test-case scenarios (without comparison with historical trends). With our approach we aim at finding the correct values for the ABM parameters through the fine-tuning process guided with observed data.

Extending the work of Zhao et al., Palmer et al. propose a different ABM [28]: again, the agent decision criterion is based on four different factors, but in this case these factors are weighted according to the agents' social class. Each agent (corresponding to a household) is associated to a specific social class; a small-world network connects all agents and households belonging to the same classes tend to be linked together. The model parameters are calibrated using the PV installation trend in Italy during the 2006–2011 period but all the data set is using for the training, therefore no validation is performed using new data. This poses a risk of overfitting the model parameters to the particular historical period taken into consideration. The risk is also increased due to the selected period: in 2006–2011 the PV installation rate was mainly governed by a set of incentives offered by the Italian government that changed considerably in 2012, as was described in detail by Borghesi et al. in [29].

The approaches listed so far discounted the geographical location of buildings. Robinson et al. [30] introduce this new element and integrate data coming from a geographic information system (GIS) with an ABM, with the goal of analysing the diffusion rate of PV panels. The addition of the actual topology of the target area permits to include the effects of solar exposure and population density on the diffusion of PV systems, thus improving the accuracy of the model. The parameters calibration is done using real data of the historical PV adoption in the city of Austin, Texas. While the results are interesting, no validation has been performed yet, i.e., all the data has been used to fine-tune the model, whose accuracy w.r.t. new observed behaviour has not been computed. Davidson et al. [31] take into account geo-spatial information as well as population demographics in order to forecast the photovoltaic adoption trends. Their goal consists in understanding the best predictors for the installation of PV panels. The analysis highlights that a relatively small subset of geo-spatial data can be used to obtain estimates (in terms of PV adoption trend) as accurate as those obtained with much larger and more comprehensive geo-spatial data sets. This work does not develop an agent-based model and it is mostly focused on understanding which are the geo-spatial factors with the major impact on the PV adoption and thus is not directly comparable with the one proposed in this paper.

Zhang et al. [32] outline an ABM to study the adoption (both at individual and community level) of rooftop PV panels, considering the San Diego county as a case study. The key point of the proposed approach is to learn a model for the behavior of individual agents using combined data of individual adoption characteristics and property assessment; the learned model is then integrated in the agents involved in the simulation. They also employ their system to evaluate different policy strategies targeted at fostering PV adoption. The proposed model is calibrated using observed PV adoption rates in the city San Diego, California. The authors also propose a preliminary validation of their model, comparing its prediction accuracy to the accuracy of baseline model (a model taking into account fewer factors than the presented one). The validation lacks a full comparison of the model predicted behavior and the observed one.

Macal et al. [33] describe an agent-based model (called BE-Solar), that incorporates a social and behavioral decision framework for understanding the technology adoption process. The main limitation of this approach is the lack of model calibration and validation. Rai et al. [34] present a empirically-driven agent-based model of technology adoption applied to residential solar photovoltaic. The variables describing the agents' behaviour are fine-tuned using historical data. In their work, they propose a theoretically-based framework and consider multiple validation criteria.

Agent-based models have also been employed to simulate policy scenarios and provide recommendations. For example, Lee et al. [35] proposed an ABM to model the decision-making process of homeowners while buying and installing energy efficient technologies in their homes. Homeowners' decisions are based on a simple additive weighting algorithm that estimates the utility values of different options, ultimately selecting the one with the maximum utility value. The utility

values of different options are calculated based on a combination of empirical factors (derived from housing stock data), social factors, and policy regulations. Installations lead to altered energy demand and CO2 emissions. The model was partially calibrated using observed data; due to the limited availability of historical data only a couple of technologies were subjected to calibration. Although this is a clear limitation, validation was not the main scope of the paper that instead was mostly focused on providing a tool for comparing different scenarios.

Johnson et al. [36] model households photovoltaic solar panels adoption following an approach where household agents initially make decisions based on their subjective beliefs, awareness, and attitude towards the technology. These factors determine the chance the homeowners meet with a photovoltaic installation company, at which point they become rational profit-maximising consumers, weighing up the costs and benefits (subsidies etc.) of installing solar panels. This model enabled the researchers to make recommendations to regional government on the potential impact of incentive policies, and how different policies compared in terms of costs, energy capacity installed, and participation rates. No validation technique for this model was proposed, as this work is more focused on studying the theoretical impact of different policies, rather than providing a predictive tool.

Adepetu et al. [37] employ an ABM to study the impact of realistic incentive mechanisms on the adoption and diffusion of PV-batteries. They observe that while many different types of incentives have been proposed in the last few years, those incentives did not have the same effect in different parts of the world, due to the underlying different conditions and contexts (referred to as jurisdictions). Then, in their work they propose jurisdiction-specific ABMs in order to find the best incentive policies. As a case study they consider two distinct jurisdictions, Ontario and Germany. The agents' decision process is partially based on questionnaires (self-reported behaviour) and the tuning of some of the model's parameters is performed using historical data. The proposed approach is interesting but it is explicitly focused on PV-batteries and do not consider PV panels on rooftops, hence it is not directly comparable to ours; moreover, the authors validate the models using historical data (looking for parameters values that lead to the best fit with observed data) but they do not evaluate the predictive capacity of the proposed ABMs—the main focus is the comparison of incentive policies among different jurisdictions. Furthermore, the proposed paper do not provide statistical, quantitative measures of the accuracy of the fitting method, hence the comparison is not possible.

Sinitskaya et al. [38] describe an agent-based model to explore the impact of installers of PV panels on the adoption rate of PV technology among residential households. PV installers and PV household buyers are modeled as agents and the explicit goal is to maximise the profits for panels installers—hence their work is not directly comparable to our methodology. However, a shared aspect is the use of interviews with the involved stakeholders (investors and homeowners) in order to devise the proper (realistic) decision algorithm of the model's agents.

Lee et al. [39] propose to combine ABMs with a logistic regression model to estimate the correct values for the model parameters. They apply this hybrid scheme to the case study of rooftop PV panels adoption in a neighborhood located in Seoul, South Korea. The agents in the ABM are buildings placed in a geographically accurate simulated world, thanks to a geographical information system (GIS). The house owners of the corresponding building decide whether to install a PV panel or not, depending on multiple factors (economical, social, geographic). The parameters of each agent are tuned using logistic regression and very fine-grained building-level data collected during multiple years. The validation is performed using the cumulative observed data (sum of all adopted PV systems in the neighbourhood). In this way, this work uses the observed behaviour to obtain realistic models (the logistic regression is guided by historical data), although the decision algorithm of the agents is not based on self-reported behavior. Moreover, the validation is performed on the same time period used for tuning the parameters (albeit at different granularity), and thus no indication is provided on the predictive capacity of the approach.

While there are recent works that strive to bridge the gap between self-reported and observed behavior (for instance [37,39]), they do not explicitly frame the problem in these terms, thus they only consider a partial aspect of the overall issue. For example, Lee et al. [39] do not consider self-reported behaviour. Furthermore, the majority of the approaches terminate their analysis at the first step of the validation phase: the ABM parameters are tuned using historical, observed data but no study on the prediction capability of the model is performed. In this way, these approaches are proven to be well suited for describing the observed data (a worthy task), but no guarantee is given about the usefulness for predicting future trends—which is a very important aspect for policy makers. On the contrary, our approach advances the state of the art in two ways: (I) it explicitly states the need of considering both self-reported and observed behaviour, as only via merging them it is possible to obtain accurate ABMs; (II) it is validated on a predictive task, that is the model parameters are tuned using a subset of the observed data while the test is performed using a separate subset (a different time period).

#### **3. Methodology Overview**

This section provides an overview of the proposed methodology. The rest of the paper is devoted to describe how the proposed approach has been applied to the photovoltaic adoption in the Emilia–Romagna region. Figure 1 depicts the scheme of the methodology. The main idea was to start from the self-reported behaviour (collected through questionnaires and interviews) and extract drivers and barriers influencing the stakeholders' decision process. These decision factors were encoded in an agent-based model via a set of parameters—the relative values of the parameters indicated the magnitude of the impact caused by the associated factor. This ABM was a template for the decision process inferred from the extracted drivers and barriers.

**Figure 1.** Methodology scheme.

A key part of an ABM was assigning a value to the model parameters. Since it was very hard to deduce appropriate parameter values only through self-reported behaviour, the model template underwent a fine-tuning phase where the parameters are empirically tuned using part of the observed behaviour. The observed behaviour was a time series describing the emergent overall behaviour that we want to simulate. Adopting standard machine learning terminology, we partitioned the observed behaviour into training and test sets. We trained the model (changing the parameters' weights) to make the output of the simulator as close as possible to the training observed behaviour and we validated it on the test set.

After the fine-tuning (also referred as calibration in the literature) the final outcome was a simulator that is able to predict with great accuracy the behaviour of interest. Now, the predicted behaviour and the test set (a subset of the observed behaviour) can be used to assess the quality of the approach; we defined this phase as model validation. The model validation stage can have different outcomes, depending on the accuracy of the predicted behaviour measured using the validation error (the difference between the predicted and the observed behaviour). If the accuracy was not high enough, it was possible to recalibrate the ABM parameters, for example using a different training set or improving the calibration method itself (i.e., letting the fine-tuning algorithms run for a longer time, thus exploring more parameters configurations). If the validation error is too big, a simple recalibration would probably not suffice; in this case the ABM template should be revisited in order to better encapsulate the drivers and barriers previously identified. If the validation results are very poor the best recourse would be a major overhaul of the ABM, possibly repeating the questionnaires and interviews phase with different questions, in order to get a deeper grasp of the agents decisional process. This second refinement should be performed by domain experts and it is out of the scope of this paper.

A key element of the proposed approach was the combination of agent-based models with automated fine-tuning techniques, to calibrate/validate the models. This way led to a method capable of filling the gap between observed and predicted data. We therefore merged methodologies belonging to both agent-based modelling and automated parameter tuning, moving towards a research area that has been very rarely explored so far. In our case, applying an automating tuning mechanism means exploring the configuration space of the ABM with the goal of finding the parameters with the best performance, that is those parameters that minimize the distance between the predicted behaviour and the observed one. As shown in Section 7, an ABM validated using historical data can be also used for predictive tasks with good accuracy, hence helping policy makers devising policies and strategies.

#### **4. Driver and Barrier Extraction**

The first challenge that needs to be addressed consists of the definition of the agents behaviour. The factors (drivers and barriers) influencing the decision-making process of the actors of the simulation model need to be identified. Drivers are those elements that lead to making a specific energy efficiency investment, while barriers are those elements against the investment. These factor were extracted from a large set of empirical data, gathered through questionnaires and interviews conducted in the Emilia–Romagna region as part of the ePolicy European research project [40]. The empirical data collection took place between March and August 2013. In this section we provide a partial overview of the methodology and the results of the surveys; for the complete and detailed discussion we refer to the work of Balke and Gilbert [41].

The tools used to collect the data are the following:


The results of both questionnaires and interviews indicated that a series of economic and social elements come into play when deciding to install a PV panel. Some of the factors involved in the decision process were purely economical (for example the profitability of the investment) or physical (such as the type of house and/or the availability of enough roof surface). Other elements reflected personal values and motivations (such as environmental awareness) and represented the social component of the decision process (for example the diffusion of PV-related information within peer networks). A very important aspect to be considered in the simulation is that many households pondering about PV installations are not able to make a clear costs-benefits calculation, but rather act on (other) perceived benefits of photovoltaic.

Some of the drivers identified are the following:

• high energy costs and potential saving after the PV panel installation,


Conversely, these were some of the main barriers:


As illustrated in the following sections, the drivers and barriers identified via interviews and questionnaires form the basis of the decision model of each agent in our simulation. However, even when drivers and barriers have been identified, it is not clear how these parameters are weighted and which is their relative importance in understanding the decision making strategy of different house owners. For this purpose, the model parameters associated with the decision factors will undergo an empirical tuning phase exploiting the observed data, the historical installation rate of PV panels in the Emilia–Romagna region.

#### **5. The Agent-Based Model**

The process of extracting drivers and barriers from self-reported behaviours and embedding them in an agent-based model is not straightforward. Self-reported behaviour is not always easily quantifiable and, generally speaking, it cannot be used to directly infer a set of rules defining the decision algorithm of the simulated actors. In recent years, the authors of the current paper experimented with several ABMs with the purpose of better capturing the self-reported behaviour. In [42] a purely economic ABM was proposed, in order to understand the impact of national and regional incentives on the adoption of PV panels by residential homeowners. Each agent takes into considerations a series of economic factors that influence its decision to buy and install a PV panel. The model presented in [29] adds a preliminary social component: the behaviour of each agent is influenced by the decisions already taken by its neighbours and by the perception of PV technology possessed by each agent.

The knowledge gained with the previous works led to the methodology presented in this paper. For example, a very important lesson is that discounting non-economical factors generates ABMs fail to properly reflect the observed behaviour. The critical improvement of the currently proposed approach is the refinement of the agents decisional process and, most importantly, the fine-tuning strategy that lead to a model capable of predicting the future trends of PV panels installation rate. A preliminary version of the model discussed here (but without the optimal fine-tuning and predictive capability) was already presented in [43]. The model presented in this paper also adds an entirely new aspect compared to previous works, namely the adoption of real geographical data to obtain a more realistic ABM.

The model discussed in [43] was used mainly as a proof-of-concept of the proposed approach, namely it allowed to explore possible methods to merge self-reported behaviour (encapsulated in the decision algorithm of the agents in the model) and observed behavior (the real, historical data on PV installed power in the Emilia–Romagna region). The agent-based model was created and its parameters were tuned using methodologies described in the following section, but after the fine-tuning no validation was performed. In particular, the model was trained using the historical data gathered in the 2007–2013 time frame; no assessment of its predictive capacity was performed, in part due to a lack of sufficient observed data. Conversely, the current work provides a comprehensive

evaluation of the overall proposed methodology and of the predictive capability of the approach. In the last years, additional observed data was collected (4 more years, 2014–2017) and used to properly validate the ABM, after the parameter tuning.

The model was composed of two types of agents: house owners and the region. The region agent provided regional incentives to house owners; at the start of the period there are some initial funds and each year the region receives a further constant budget to foster the installation of PV plants. The house owners (also referred to as households) are the main actors in the simulation: each house owner decides whether to install a PV panel or not, based on a decisional process illustrated in the following sections. Each house owner is described by a set of attributes: age class, education level, income, family size, consumption, roof area, budget, geographical coordinates and social class. These attributes cooperate to define the household behaviour and to build a social network linking all agents.

The ABM was built in two stages. Firstly, the simulation environment was set up and the virtual world was populated with agents (the households)—this was the configuration phase. The placement of houses followed the actual buildings distribution in the Emilia–Romagna region, in particular taking into account houses positions and their roofs. Then, the social network among household agents was built; the network was created depending on the reciprocal, physical distance between households and the distance in terms of attributes such as class, income, age, and so on. The social network's main contribution was defining how the information about PV system was spread across the simulated world and its agents.

After the configuration, a second stage takes place, called simulation phase in the rest of the paper. During this stage, the simulated world comes to life and the agents begin to ponder whether to install a PV panel or not. The simulation itself can be decomposed as a series of smaller steps, each of them lasting for six months. The installation decision was influenced by several factors, ranging from financial considerations such as the household income and the initial investment cost (and related payback time), to other aspects such as the environmental sensitivity and the neighbours behaviour (neighbours in the social network). The influence of these different factors was encapsulated in four expressions (also referred to as utility functions; these four expressions were then combined in order to establish the desire level of each agent—if the desire remains below a certain threshold, then the household does not install a PV panels, otherwise it proceeds with the investment.

#### *5.1. Configuration Phase*

The virtual environment initial conditions are defined during the configuration stage; moreover, the simulated world was populated with the agents, placing the buildings and assigning them a roof size according to the actual distribution observed in the Emilia–Romagna region (data made publicly available by the region itself). First, the world-area was filled with buildings and their roofs (fixed geographical coordinates). Secondly, the families-households were created (as many as specified by an input parameter) and each one was placed inside a building (buildings are not shared); households with higher income and the more numerous ones get the buildings with larger roofs.

The positions of the buildings in Emilia–Romagna were obtained by parsing the Ersi shape-files publicly available (http://dati.emilia-romagna.it), which are the results of territorial surveys conducted by the region; in these files each building was represented by a polygon encapsulating multiple information about the building. The agent-based model proposed here requires only position (spatial coordinates) and roof size, hence only these relevant information were extracted. QGIS [44] (an open source Geographic Information Systems, GIS) was the tool employed to parse the Ersi files and collect the needed information (position and roof size).

An important aspect that had a strong effect on the adoption of PV panels (and innovation in general) is the presence of incentive mechanism, aimed at fostering the diffusion of new (or less known) technology. From 2006 to 2014 the Italian government offered national incentives to private citizens willing to install PV panels, namely feed-in tariffs referred to as *Conto Energia*. There have been a few different tariff schemes during the incentives years ([45–48]), differing for the price guaranteed for

the produced electricity. The national incentives are available to all house owners and the tariffs are those actually offered during the considered period. On top of the Italian feed-in tariffs, regional policy makers in Emilia–Romagna devised a number of different additional regional incentive mechanisms, such as investment grants, fiscal incentives, loans, interest funds, etc (see [42] for more details). Historically, the national incentives outweighed the regional ones by at least one order of magnitude therefore their influence has been much stronger [49,50], thus regional incentives have little or no impact on the agents behaviour.

#### 5.1.1. Social Classes

The agent households were characterized with a set of attributes, whose values permitted to define the category of the family; these attributes were: income, size (number of components), education level, age class (average value among all components), number of earners, yearly energy consumption and social class. The attribute values assigned to the households followed the real distribution in the Emilia–Romagna region, obtained from the Survey on Household Income and Wealth (SHIW) provided by Bank of Italy (https://www.bancaditalia.it/statistiche/indcamp/bilfait/). On the basis of its attribute values, each agent possesses an associated budget for installing a PV panel. Clearly, the spending capability of a household is directly related to its income class, and this influences the price that each family is willing to pay for purchasing the PV panel. Households with higher income will accept to pay more for the initial investment, while a lower income was associated to a lower budget to invest in a PV system. Generally speaking, households that belonged to the same category (class) made similar decisions when deciding whether to install a PV panel. In practice, the proposed model assumed that this available budget of each family was determined by the attributes defining the family category. A linear regression model was then used to correlate the budget to the explanatory variables obtained from the SHIW data.

Households social classes serve to mimic the different adoption rates of innovative technologies observed in many scenarios [51]. For instance, the so called S-shape curve [52] has been widely used to describe the adoption of an innovation: initially, the adoption rate of a new technology is slow since it is not well understood and its benefits are unclear (or not fully perceived). In a second phase, the adoption rate rises together with the spread of the technology and associated knowledge (mass market phase). Beyond a certain point, the market gets saturated and the adoption rate flattens. Rogers [52] identifies five categories of different adopters: (1) innovators, (2) early adopters, (3) early majority, (4) late majority and (5) laggards. The different categories were usually reflected on characteristics defining each adopter, such as socio-economic status (i.e., high-income individuals can afford to invest more on new and not yet well-established technologies). Each house owner fell in one of the five adopters categories, depending on three of its attributes (the most important features): age class, education level and income. *K*-means was used to identify the five clusters and group the agents belonging to the same class.

#### 5.1.2. The Social Network

As discussed previously, the behavior of a household-agent is significantly impacted by its social network. For this reason, during the configuration phase the social links between agents are created, namely each family has a set of friends (other households). Since previous research has shown (see [53–56]) that a small-world topology maps well the real network of relationships that exists between people, the social network adopted in this ABM has small-world properties. Small-world networks are characterised by a shortest-path distance between nodes that increases relatively slowly as a function of the number of nodes in the network [57].

The extended version of the rank-based model proposed by Liben–Nowell et al. [58] was used to get the small-world properties. The probability that a link between node *u* and node *v* existed was proportional to a ranking function which depended both on the geographical proximity of the nodes (physical neighbours) and on the attribute proximity of the nodes (how the nodes are similar w.r.t. their attributes). After a network was built using the extended rank-based method, randomness was added through long-range links. These links drastically reduced the average path length because they connect distant parts of the network. The randomization process takes every edge and rewires it with an empirically obtained probability *p*.

#### *5.2. Simulation Phase*

In the simulation phase the system evolved as previously described: the decision regarding the installation of new panels took place between 2007 and 2013, then the simulator ran until 2036 to consider the lifetime of PV panels. As described at the beginning of Section 5, each agent had a particular desire level that encapsulates its willingness to invest in a PV panel. The desire level of each household is computed during the configuration phase (Section 5.1), depending on the agent's set of attributes and mathematically expressed as an utility function. The function is a weighted combination of different factors: household income, payback period (of the initial investment), perceived and expected environmental benefits, and the pressure from neighbours (as identified by the social network among agents). The weights were used to combine these factors depending to the household class, or category (see Section 5.1.1). The actual values of the weights cannot be analytically obtained and were instead tuned via model calibration, exploiting the real historical data about PV power installed in the Emilia-Romagna region in the 2007–2013 period (detail in Section 6).

The average lifetime of a PV system was 20 years; the expenses and gains cumulated during this lifespan served to estimate the return on equity (ROE) of a PV panel. The yearly cash flow was computed by subtracting the yearly total expenses from the yearly earnings—clearly considering only PV-related financial movements. The yearly expenses can be obtained by summing the cost of the system divided by its lifetime (mortgage payment), the maintenance costs and the interests on eventual loans. Potential yearly earnings comprised the surfeit electricity sold to the national electrical grid and the electricity bill savings granted by self-production. Alongside, there could be national and/or regional incentives, with a profound impact on the overall profitability of an investment. The incentives can influence yearly cash-flows in different ways: for instance, the gains are directly linked to the Italian national feed-in tariffs, while yearly expenses depend on the initial cost (affected by regional investment grants) and loan interests (target for several incentive schemes).

Each household had to find the optimal size for the PV system (the size that maximises the ROE); if the set of conditions characterizing an agent were unfavourable, the house owners can also opt not to install a PV panel. The problem was solved with an heuristic algorithm based on Simulated Annealing [59]. The proposed model assumed that households aimed at making well informed decisions, for example by getting advice from PV installers in order to properly understand the available options. Hence, agents are supposed to purchase PV panels with the goal of maximising their reward w.r.t. energy production and financial savings.

#### The Utility Function

One of the most important component of the proposed approach is the criterion used by agents/households to decide whether to install a PV panel. As mentioned earlier, this decision is taken by each agent based on the values of its attributes. The decision criterion for agent *v* is expressed by the utility function (also referred to as desire level):

$$\mathcal{U}(\upsilon) = w\_P(\mathfrak{c}\_\upsilon)\mu\_P(\upsilon) + w\_B(\mathfrak{c}\_\upsilon)\mu\_B(\upsilon) + w\_E(\mathfrak{c}\_\upsilon)\mu\_E(\upsilon) + w\_N(\mathfrak{c}\_\upsilon)\mu\_N(\upsilon) \tag{1}$$

where *cv* is the class of the agent (details in Section 5.1.1). The utility function is a weighted combination of four components:


Each factor in Equation (1) was weighted by a class-dependent parameter; these weights are *wP*(*cv*), *wB*(*cv*), *wN*(*cv*) and *wE*(*cv*)—the notation serves to express their dependency on the agent class *cv*. Each agent had its set of specific weights; agents of the same social class do not share the same weights (at least not necessarily—this still could happen as a byproduct of the parameters tuning procedure). A key aspect of the proposed approach was assigning correct values to these weights: this is the crucial operation where the self-reported behaviour obtained with questionnaires and interviews (which guided the definition of the agents behaviour) was merged with the observed behaviour (historical data of installed PV power). This passage will be described in detail in Section 6.

The first factor in Equation (1) regards the payback period, *pp*. To obtain a balanced influence of all factors in the utility functions, all factors where normalized in a [0,1] range. In the case of the payback time, the normalization takes advantage of the bounds on the minimum payback period *min*(*pp*) (assumed to be equal to one year) and on the maximum payback period *max*(*pp*), assumed to be 21 years since the expected useful life for PV systems is 20 years. Hence, the payback influence for agent *v* is computed following [28] and expressed by this equation:

$$\mu\_P(v) = \frac{\max(pp) - pp(v)}{\max(pp) - \min(pp)} = \frac{21 - pp(v)}{20} \tag{2}$$

where *pp*(*v*) is the payback period for the initial investment. Its value is computed using the net present value (NPV) of the PV system—the NPV typically starts with negative values (due to the initial cost of the investment) and it gradually gets closer to zero, while the initial cost is offset by yearly gains due to electrical bill savings and sale of own-produced energy. When the NPV turned from negative to positive it indicated the point when the investment became profitable. The computation of the NPV was based on the yearly cash flows—each agent measures its expenses and gains (taking into account also national and regional incentives) and computes its yearly NPV accordingly.

The household budget *uB*(*v*) is given by:

$$\mu\_B(\upsilon) = \frac{e^{\upsilon\_{\text{kudget}}}}{\upsilon\_{\text{equity}}} \tag{3}$$

with the initial investment *vequity* computed as the PV panel installation cost minus any applicable incentive. *vB* is the disposable budget of the household.

The third factor contributing to the agents' decision is the environmental benefit that can be gained by adopting PV technology, instead of consuming electrical energy coming from non-renewable resources. These benefits are measured in terms of oil saved, which is in turn correlated with an overall decrease in CO2 production. The Italian Regulatory Authority for Electricity and Gas provides a factor to convert the produced energy (expressed in MWh) to the equivalent in tonnes of oil (TOE) (A TOE is defined as the amount of energy released by burning one tonne of oil, or 0.187 TOE for each MWh produced). Thanks to this conversion, the ecological benefits can be computed with the following equation:

$$\mu\_E(\upsilon) = \frac{1}{\mathcal{e}^{oil\_{\text{notCompound}} - oil\_{\text{roundmed}}}} \tag{4}$$

The final component of the desire function (Equation (1)) is the influence of the other members of the social network of the agent. This factor is identified by *uN*(*v*) and it encapsulates the importance of the neighbours' choices in shaping the household behaviour. As previously mentioned, the agent's neighbours are the nodes (other households) with a shared links; the vicinity of two nodes depends on geographical proximity and social class similarity. The neighbourhood influence contribution is computed with the following equation:

$$\mu\_N(v) = \frac{1}{1 + e^{\frac{1}{2}L\_{v,tot}L\_{v,adjacent}}} \tag{5}$$

with *Lv*,*tot* being the total number of links of agent *v* and *Lv*,*adopter* the number of links shared with adopters.

#### **6. Parameter Tuning**

So far, only the agent-based model has been described. The model has been built upon the insights gained by analysing the self-reported behaviour, gathered through questionnaires and interviews. By leveraging this information it was possible to identify barriers and drivers that affect the decision criterion regulating the installation of a PV panel (see Section 4). These factors were then used as a guide for the behaviour of the agents in the simulation world; however, the algorithm describing the agents' behaviour hinged on a set of parameters that cannot be easily obtained through analytical tools. At this point, the second stage of the proposed methodology came into play, namely the observed behaviour. Historical data will be used to fine-tune the model parameters, thus obtaining a model which is capable to faithfully describe real world dynamics and that can be used to make accurate predictions, as reported in Section 7.

The historical data of the PV panels installation trend in the Emilia–Romagna region was gathered looking at the data provided by the Italian government [49], in particular, the PV installation trend in Emilia-Romagna from 2007 to 2017 (Unluckily the data regarding years earlier than 2007 is very scarce due to the almost negligible consideration given by the Italian government to PV technology). The data set is divided in two chunks: training set, from 2007 to 2013, and test set, from 2014 to 2017. The training set is used to fine-tune the model parameters (trying to fit the simulated trend to the observed one). Afterwards, the trained model can be used to predict the PV installation rate during the test period; then, it is possible evaluate the quality of the prediction and thus the accuracy of the model, by comparing the historical data with the predicted one.

As a reminder, the parameters that needed to be tuned were the weights of the utility function: *wpp*(*cv*), *wB*(*cv*), *wN*(*cv*), and *wE*(*cv*). In practice, the scope was to find the weights values that better fit the curve representing the Emilia–Romagna PV power installation rate. The tuning problem can be seen as fitting a model to real data; there exist several methods to perform this task. After a preliminary evaluation of different methods, a genetic algorithm (GA) [60] came across as the technique that provided best results without requiring excessive computational resources. This happens because GAs are apt at finding solutions in spaces where it is hard to derive analytical models and it is not easy to mathematically find global optima. Another benefit derived from the use of a genetic algorithm is the fact that they have been proven to be very effective at dealing with problems where small changes in the weight configuration can lead to a great impact on the final outcome. This is the case of the proposed agent-based model: the four factors of the desire function are strongly intertwined and linearly combined (see Equation (1)). Moreover, the decision criterion is influenced also by the social interaction: the weights assigned to a given agent can modify its decision, which in turn has an impact on the decision process of other (possibly many) neighbours.

The genetic algorithm began with a random initial population of parameters configurations (the "individuals" in GA terminology). The initial population was then evaluated by running the agent-based model and observing the PV installed power by all households, given the currently applied parameters. Since during the training phase, the target PV power is available (the real, historical PV installation data in Emilia–Romagna), it was possible to assess the accuracy of the fit of the current population, by measuring the difference between target and simulated power. After the evaluation, the GA selected the next generation of individuals (a different set of model parameters), which were evaluated as well, with the goal of finding the best fitting population. To generate the next population

the tournament selection [61] was employed: *k* individuals are selected from the actual population using *n* tournaments of *j* individuals. From every tournament emerges a winner (the individual with the highest fitness—the one generating the smallest distance between simulated and historical PV power) and this is the parameter configuration selected for the next generation.

The new population was not deterministically decided with the tournament mode, but a certain degree of randomness is introduced through crossover and offspring mutation. The former mechanism randomly chose two individuals for reproduction and one or multiple children were bred from them; in the proposed genetic algorithm one-point crossover has been used. A single crossover point on both parents' configuration is selected and a new child configuration is created via a swap of the values beyond the crossover point. The second random mechanism, mutation, works by randomly modifying values (parameters of the agent-based model) in randomly chosen individuals. The evolution process (creation of offspring, random mutations, evaluation) was repeated four hundred times.

#### **7. Model Validation**

After the parameter tuning via genetic algorithm described in the previous section, the resulting model accuracy needed to be measured. For this analysis, the ABM was composed by by 2000 agents; each simulation required around 10 s with a 2.40GHz Intel QuadCore (i7-5500U CPU) with a 16GB of RAM. The genetic algorithm used a population of 50 individuals and the overall time required to calibrate the model was around 30 h. As mentioned before the parameters tuning was made using observed data in the period 2007–2013; observed data in the 2014–2017 range was used only during testing. To summarize, the experimental setup was the following: (1) create an ABM and calibrate its parameters with the genetic algorithm; (2) simulate the 2007–2017 period with the fine-tuned ABM; (3) observe the simulated PV cumulative installed power and adoption rate—the difference between observed and simulated data in 2007–2013 measures the quality of the fine-tuning technique, while the difference measured in the 2014–2017 period serves to evaluate the predictive capability of the model. This scheme was repeated 30 times to obtain statistically significant values; in the rest of the paper, only mean values were reported (in both graphs and table). As metrics for the evaluation, we considered the mean absolute error (MAE), the root mean squared error (RMSE), the mean absolute percentage error (MAPE), and the coefficient of determination (*R*2). These are standard metrics and are defined by the following equations (the equation for *R*<sup>2</sup> is not reported for the sake of clarity):

$$MAE = \frac{1}{N} \sum\_{i=1}^{N} |o\_i - s\_i| \tag{6}$$

$$RMSE = \sqrt{\frac{1}{N} \sum\_{i=1}^{N} (o\_i - s\_i)^2} \tag{7}$$

$$MAPE = \frac{100}{N} \sum\_{i=1}^{N} \frac{|o\_i - s\_i|}{o\_i}\_{} \tag{8}$$

where *N* is the number of runs (30), *si* is the simulated value, for instance the PV power installed in one year in the simulated environment of the ABM, and *oi* is the observed value.

We considered two outcomes to measure the fine-tuning and prediction results: (1) the yearly installed PV power and (2) the cumulative installed power, that is the total value obtained by summing previous years installed power and the current year's installed capacity. The former value better reflects the yearly changes in adoption rate while the latter could be more useful to policy makers to devise strategies for reaching long-term goals (i.e., a certain amount of total PV power by year 2020). Table 1 reports the validation results after the parameter tuning. Each row corresponds to a year; the table includes both the period used as training set (2007–2013) and the period used to evaluate the predictive capability (2014–2017), separated by a horizontal line. The last row reports the average values computed over the whole time frame 2007–2017. The first column corresponds to the year;

the following three columns report MAE, RMSE and MAPE for the yearly installed PV power; the last three columns show the same metrics computed on the cumulative installed power.


**Table 1.** Validation and prediction results of installed photovoltaic (PV) power; yearly rate and cumulative installations.

Let us consider first the yearly adoption rate. Following the type of validation made by most other works in the literature, that was looking only at the results on the training set, it can be noted that the fit was extremely good. The average *R*<sup>2</sup> on the training set was equal to 0.997, the MAE was equal to 0.385, the RMSE was equal to 0.483, and the MAPE was equal to 4.52. These values indicate that the parameter tuning was very effective. A higher MAPE can be observed for the year 2008 (15.41); the simulated installed PV power was lower than the observed one and this could indicate that the current version of the ABM did not include some drivers that boosted the adoption of PV panels in early years. Moving on and considering also the predictive capability of the proposed approach, the results remain promising. However, it can be observed that the accuracy decreases in the test set, especially in 2015, where the simulated installed PV is significantly lower than the observed one; the proposed ABM overestimates w.r.t. the historical data. The statistical results obtained considering both the train and the test set were the following: *R*<sup>2</sup> = 0.991, MAE = 0.524, RMSE = 0.693, and MAPE = 7.71. Conversely, if we computed the average values only over the test period, these were the results: MAE = 0.896, RMSE = 1.16, and MAPE = 16.36. The precision was skewed especially by the large error made in 2015 (MAPE equal to 32.95). Probably this was due to the fact that the ABM underestimates the impact of the reduced national and regional incentives on the households decision process; as a reminder, national incentives by the Italian government ceased at the end of 2013.

In order to contextualize these values, the comparison with the validation results proposed by a recent related work could be considered, namely Lee et al. [39] (see Section 2 for details); this work was chosen because it provided easily comparable metrics and has a similar approach (tuning the parameters of a ABM using historical data). The comparison might not be completely fair since Lee et al. used different sets for parameter tuning and validation, respectively, building-based data and area-wide (summing the power of all PV panels installed in a neighbourhood) data while our ABM was trained and validated using region-wide yearly adoption data. However, the comparison is legitimate because we include the test set not used for the parameters calibration, thus increasing the difficulty of the task tackled by the proposed approach. Lee et al. proposed three different ABMs, with MAPE equal to 14.86, 13.57, and 62.52 (average value computed over all years). Our ABM instead has an average MAPE (considering both training set and test set) equal to 7.71, a significantly lower value. Moreover, it can be worth to notice that even the results on the test set alone (MAPE = 16.36) are similar to those obtained by Lee et al., although these numbers are not really comparable since they refer to entirely different tasks, pure validation (Lee et al.) and prediction (the approach proposed in this paper).

Figure 2 shows the PV installation growth rate in the considered time period (2007–2017). The solid blue line corresponds to the PV growth rate predicted by the agent-based model while the black dashed lines depicts the real installation trend in Emilia–Romagna. The *x*-axis displays the year and the *y*-axis reports the PV power growth in percentage. The figure contains the results for both the training set (period 2007–2013) and the test set (2014–2017). In this way it is possible to observe the quality of the proposed fitting mechanism, by observing the lines discrepancies in the training set. At the same time, the figure reveals that the ABM does not suffer from overfitting and the proposed methodology can be used to create models that generalize well and, consequently, that can be used for predictive purposes—by looking at the differences between the lines in the test set. The visual analysis revealed results similar to those presented with the quantitative analysis. In fact, it can be noted that the parameter tuning was very effective (in practice the two lines overlap in the training set) and that the predictive capability are good as well, albeit slightly less accurate. Focusing on the PV case study, there were also useful insights that can be gained by looking at the installation trends, both real and simulated. Both curves clearly indicated that the initial growth rate has been relatively slow (2007–2009), possibly motivated by an initial reluctance due to limited knowledge and doubts about the PV technology. As the knowledge gets more widespread and the Italian national incentives increased (higher feed-in tariffs were offered in the years 2010 and 2011), the installation growth increases steeply, with a distinct peak around 2011. After 2011, there was a steady decline in number of new PV panels installed, as the financial benefits for homeowners start to become smaller, due to a decrease in national incentives not compensated by regional incentives or by a sufficient decrease in the cost of the technology.

**Figure 2.** Model calibration results—yearly installed photovoltaic (PV) power.

This could be partially explained by the extremely complex situation that occurred in the Italian PV technology field in the last few years, with longstanding incentive mechanisms that abruptly came to an end and different regulations following one another. This situation generated a marked discontinuity in the installation trend, a discontinuity that was very hard to forecast. Clearly, there is still room for improvement in terms of ABM accuracy, but it is important to notice that the proposed

approach can already emulate the observed behaviour with a precision more than sufficient to help policy makers in their decisions.

Now we look at the validation results with the cumulative installed power (last three columns of Table 1). The effect of the worse accuracy computed in the test set is amplified by the smaller magnitude of the installation rates observed in the 2014–2017 period w.r.t. the peak values observed in previous years. If we consider the total PV installed over the years the results of the parameters calibration are still very good and the "small-values" effect noticed in the test set loses its influence. Figure 3 shows the cumulative installed PV power in Emilia-Romagna in 2007–2017, both according to the historical data (black dashed line) and to the agent-based model (red continuous line). Since our simulator considers only 2000 agents against the millions of households in Emilia–Romagna, the historical and simulated absolute values differ by orders of magnitude; in order to render the comparison possible both sets (observed and simulated) were normalized dividing by the maximum value (year 2017).

As both the quantitative analysis and the graph reveal, the parameters tuning works even better for the cumulative installed power. In this case, the average values computed over the whole time frame were the following: *R*<sup>2</sup> = 0.999, MAE = 0.008, RMSE = 0.009, and MAPE = 2.73. The error was lower w.r.t. the case of the yearly PV installed power because the yearly changes in adoption rate are "smoothed" by the sum operation. This effect is even more pronounced when considering the test set alone. In this case the MAE is equal to 0.011, RMSE equal to 0.015 and MAPE equal to 0.92; the MAPE in particular was even lower than the training set case—this happened because in the last years (2014–2017) the PV panels installation rate has greatly slowed down, hence the errors made with the years in the test set have a relatively smaller impact on the total installed power.

**Figure 3.** Model calibration results—normalized cumulative installed PV power.

#### **8. Conclusions**

In this paper we presented a novel methodology to fill the gap between self-reported behaviour and observed behaviour, by means of agent-based model and empirical parameter tuning. As a case study, we considered the diffusion of photovoltaic power in the Emilia–Romagna region of Italy. The first step of the approach consisted in a data collection phase to enable the identification of the drivers and the barriers that influence the decision making process of house owners faced with the

possibility of installing a PV panel on the top of their houses. The data have been collected through online questionnaires and interviews.

These drivers and barriers were then used to model the decision process of the agents composing the simulation. Having both self-reported and observed behaviour, parameters attached to the various decision factors can be empirically tuned, thus enabling agent-based models to be used also for predictions, even in an approximate manner. The idea is that given the agent-based model based on the self-reported behaviour, its parameters can be adjusted and tuned exploiting past real data. Hence, a ABM that takes into account economic, social and geographical factors to emulate the self-reported behaviour has been proposed. The model is characterized by a set of parameters that were fine-tuned using a Genetic Algorithm. Finally, the accuracy of the model prediction has been evaluated, by analysing the difference between the historical PV installation rate and the results produced by the simulator. The results are very promising and the proposed approach can be used by policy makers to guide their decisions.

The future research directions that have yet to be explored are the following. First, the ABM can be refined in order to achieve a even greater prediction accuracy. Second, it is important to test the proposed methodology in different conditions (different region/countries, extended time period). Another possible direction to explore consists of scaling up the simulation size, up to the point of including hundreds of thousands of agents; this should lead to result even closer to the observed data. In our opinion, the most promising direction is to integrate the proposed approach and predictive model in a larger scheme aimed at helping policy makers with their task. After having bridged the gap between self-reported behaviour and observed behaviour, the following step would be to reach a target behaviour, i.e., the desired level of photovoltaic power production. For this purpose, the agent-based model could be used to extract the best guidelines for policy makers to achieve the desired strategic objectives.

**Author Contributions:** Conceptualization, A.B. and M.M.; data curation, A.B.; funding acquisition, M.M.; investigation, A.B.; Methodology, A.B.; Project administration, M.M.; resources, M.M.; software, A.B.; supervision, M.M.; validation, A.B.; writing—original draft, A.B.; writing—review & editing, A.B. and M.M.

**Funding:** EU ePolicy project (FP7/2007–2013), g.a. 288147.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Improving Computational Efficiency in Crowded Task Allocation Games with Coupled Constraints**

#### **Ming Chong Lim and Han-Lim Choi \***

Department of Aerospace Engineering & KI for Robotics, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea; mclim@kaist.ac.kr

**\*** Correspondence: hanlimc@kaist.ac.kr

Received: 17 April 2019; Accepted: 17 May 2019; Published: 24 May 2019

**Abstract:** Multi-agent task allocation is a well-studied field with many proven algorithms. In real-world applications, many tasks have complicated coupled relationships that affect the feasibility of some algorithms. In this paper, we leverage on the properties of potential games and introduce a scheduling algorithm to provide feasible solutions in allocation scenarios with complicated spatial and temporal dependence. Additionally, we propose the use of random sampling in a Distributed Stochastic Algorithm to enhance speed of convergence. We demonstrate the feasibility of such an approach in a simulated disaster relief operation and show that feasibly good results can be obtained when the confirmation and sample size requirements are properly selected.

**Keywords:** multi-agent systems; multi-agent planning and scheduling; potential game; equilibrium selection

#### **1. Introduction**

Application of multi-agent systems have often been considered for large-scale problems that are otherwise difficult or impossible to solve with only a single agent. Despite the difficulties in coordinating multiple agents, systems with multiple heterogeneous agents have often been preferred over a single omnipotent agent and the belief that the collective effort of multiple agents is superior to that of an individual can be attributed to concepts of robustness, parallelism, and cost issues.

Therefore, challenges in managing a multi-agent system have been extensively studied with the proposal of a variety of approaches and algorithms [1–8] which are applicable to a multiplicity of systems ranging from military operations to resource distribution [2,6,8–10]. One notable approach is the introduction of game-theoretic framework in multi-agent systems. With the continuous development of intelligence in autonomous systems, concerns of social awareness and acceptability [11] have been frequently raised. In traditional systems which are designed to achieve near-optimal solutions, agents are often programmed to be greedy and to seek maximum rewards for their actions leading to a competitive environment within the system, such that an agent may act in a manner that jeopardize the overall objective of the group for their own good. This concern leads to the adaptation of a game-theoretic framework which provide each individual agent freedom over its actions while at the same time, ensure that the solutions proposed by the collective action are socially acceptable and stable despite the players being self-regarding. As such, many game-theoretic models have been proposed to meet the requirements of different application scenarios and equilibrium selection algorithms ranging from constraint optimization [2,12] to learning [13–15] have been designed and shown to be feasible.

In this paper, we consider a game-theoretic framework for the coupled-constraint task allocation problem which is highly applicable to situations concerning military and disaster relief operations where the number of tasks is, generally, significantly large and are frequently spatially and temporally correlated. In such emergency situations, higher levels of emphasis are placed on the speed of decision-making rather than the optimality of the decisions, and as such, the main objective of this

paper is to propose a game-theoretic model that enables autonomous multi-agent systems to make quick decisions that are feasible despite possibly having lower levels of optimality. We build on our previous work [16] which considers a game-theoretic framework for spatially constrained task allocation environments, taking into account additional temporal constraints and, at the same time, propose some modifications to the Distributed Stochastic Algorithm to allow quicker convergence in congested problems.

#### **2. Preliminaries**

#### *2.1. Coupled-Constraint Consensus-Based Bundle Algorithm (CCBBA)*

CCBBA [8] is an extension to a market-based task allocation model called Consensus-Based Bundle Algorithm (CBBA) [3]. CCBBA was designed to address the inadequacies of CBBA, namely the inability to feasibly resolve problems with complex spatial and temporal relationships. As far as we know, CCBBA is the first algorithm to propose such relationships which mimic the requirements of emergency operations in the real world. Coupled constraints, which affect both space and time, can be succinctly described as follows:

#### **Definition 1.** *(Spatial coupled constraints)*


#### **Definition 2.** *(Temporal coupled constraints)*


These constraints may be non-exhaustive but are more than sufficient to meet the requirements of real-world applications. Mathematically, the constraint relationships between the tasks can be denoted as two matrices—the dependency matrix, D, and temporal matrix, T . The entry (*q*, *p*) in D*q*,*<sup>p</sup>* describes the spatial constraints between the *q*th and *p*th elements in the form of a coded variable shown in Table 1 while in T*q*,*p*, the value (*q*, *p*) in the temporal matrix specifies the maximum amount of time *q* can begin after *p* begins.



Although CCBBA can handle most coupled-constraint task allocation, it faces convergence issues in certain problem scenarios due to latency. As the problematic scenarios cannot be accurately predicted beforehand, it is infeasible to depend solely on CCBBA in emergency operations.

#### *2.2. Potential Games*

The concept of a potential game was first conceived by Monderer [17] and lays the necessary foundations in most of today's work in game theory.

**Definition 3.** *(Ordinal potential game) [18] A game is an ordinal potential game if and only if a potential function <sup>φ</sup>*(A) : <sup>A</sup> → <sup>R</sup> *exists such that:*

$$\mathcal{U}\_i(a\_i^{\prime\prime}, a\_{-i}) - \mathcal{U}\_i(a\_i^{\prime}, a\_{-i}) > 0 \Leftrightarrow \phi(a\_i^{\prime\prime}, a\_{-i}) - \phi(a\_{i\prime}^{\prime} a\_{-i}) > 0 \qquad \forall a\_i, a\_j^{\prime} \in \mathcal{A}\_i \quad \forall i \in \mathcal{N} \tag{1}$$

*where* A*<sup>i</sup> is player i's action set,* N *is the set of all players, Ui is the local utility of player i, a <sup>i</sup> and a <sup>i</sup> are the actions of player i, and a*−*<sup>i</sup> is the action of all other players except i.*

Effectively, a game is an ordinal potential game if and only if any unilateral change in actions by an individual that leads to a change in its local utility affects the potential function in the same manner. The suitability of a game-theoretic framework for task allocation problems lies in the characteristics of a potential game. Foremost, every finite potential game possesses the finite improvement property and has at least one pure-strategy equilibrium. Additionally, for every game with a finite improvement property, it will always converge to a Nash equilibrium [17,19–21]. Therefore, by modeling a task allocation problem as a finite potential game, we can guarantee the existence of a feasible pure-strategy allocation that is socially acceptable.

#### *2.3. Game-Theoretic Models*

Multiple game-theoretic models for task allocation have been introduced [1,2,5,6,22] and our previous work introduced a game-theoretic model for task allocation problems with spatial coupled constraints [16]. Consider a set of agents N = {1, ··· , *n*} and a set of tasks M = {*k*1, ··· , *km*}. The action set of agent *i*, A*i*, is then an ordered combination of all compatible tasks available inclusive of the null action **0**, and the action profile *ai* ∈ A*<sup>i</sup>* is the (ordered) path that agent *i* will take. The reward of each task, *ukj* , for a generic game-theoretic task allocation model is then defined as

$$u^{k\_j}(a) = \begin{cases} v\_{k\_j}(a) & \text{if } t\_{k\_j}^c(a) \le t\_{k\_j \prime}^d \\ 0 & \text{otherwise} \end{cases} \tag{2}$$

where *vkj* (*a*) is the reward of the task, *t c kj* (*a*) is the completion time, and *t d kj* is the deadline. The amount of reward provided and the time which the task will be completed are both dependent on the collective action of all agents. To consider the spatial relationships between the tasks, the reward structure in Equation (2) can be adapted [16] to give

$$u^{k\_j}(a) = \begin{cases} -\mathbb{C}\_{k\_j}(a) & \text{if spatial constraints are violated,} \\ v\_{k\_j}(a) & \text{else if } t\_{k\_j}^c(a) \le t\_{k\_j}^d \\ 0 & \text{otherwise,} \end{cases} \tag{3}$$

where *Ckj* (*a*) is a function dependent on the collective action of all agents that returns a real non-zero positive value. With either reward structure, the marginal utility of each agent for each task is then

$$
\mu\_j^{k\_j}(a\_{i\prime}, a\_{-i}) = u^{k\_j}(a\_{i\prime}, a\_{-i}) - u^{k\_j}(0, a\_{-i}), \tag{4}
$$

and the local utility of any agent is given as the sum of its marginal utilities,

$$\mu\_i(a\_{i\prime}, a\_{-i}) = \sum\_{k\_j \in \mathcal{M}} \mu\_i^{k\_j}(a\_{i\prime}, a\_{-i}).\tag{5}$$

The agents in the model act in a greedy manner at each iteration, selecting an action profile *a*∗ *i* that maximizes their local utility,

$$a\_i^\* = \arg\max\_{a\_i \in \mathcal{A}\_i} u\_i(a\_{i\prime}, a\_{-i}^\*)\_{\prime} \tag{6}$$

and the global utility is the sum of all task utilities,

$$\mu\_{\mathcal{S}}(a) = \sum\_{k\_j \in \mathcal{M}} u^{k\_j}(a). \tag{7}$$

For a spatial coupled-constraint task allocation problem, the above game is an exact potential game which always converge to a feasible solution when function *Ckj* (*a*) is well-designed. The proofs for the game model can be found in [23].

Recall the definition of an agent's action set in the model above. The size of an agent's action set is approximately O(*m*¯ !), where *m*¯ is the number of tasks that the agent can service (i.e., tasks that are compatible with the agent), since the action profile is an ordered sequence of task to service (i.e., an ordered path). This essentially means that the size of the action set blows up when the number of compatible tasks is large, making the model highly impractical in crowded problems which require quick decision-making due to the large number of action combinations that needs to be evaluated. In fact, this is a common problem that plagues most game-theoretic task allocation models as the computational load in equilibrium selection is generally highly dependent on the size of the task set. Chapman [2] attempted to overcome this issue by approximating a single large game as a series of smaller static potential games within a limited time interval where the agent's action is a vector of tasks to attend to during the interval [*t*, *<sup>t</sup>* <sup>+</sup> *<sup>w</sup>*], given as *ai* <sup>=</sup> {*k<sup>t</sup>* , *<sup>k</sup>t*<sup>+</sup>1, ··· , *<sup>k</sup>t*+*w*}. While this approach may help to alleviate the issue in most scenarios, it alone cannot guarantee improvements in extreme situations when many tasks are present in any of the time intervals.

#### *2.4. Equilibrium Selection*

An equilibrium selection algorithm is a negotiation mechanism that allows players to determine a Nash equilibrium, which is the stable state in the game where all players reach an agreement on their collective action. A variety of equilibrium selection algorithms have been proposed, each considering wildly different approaches. Chapman [24] methodically categorized these approaches into three main categories—a learning process [13–15], a traditional constraint optimization [2,12], or a heuristic search [22,25–27].

As seen previously, game-theoretic task allocation models tend to be impractical when considering large games due to high computational load. As such, the study on application of game-theoretic task allocation in crowded problems have largely been avoided. While some have attempted to propose algorithms that can accelerate this equilibrium selection, none seem to have convincingly tackled the problem at hand. Borowski [27] proposed a fast convergence algorithm whose convergence time is roughly linear, instead of exponential, with the number of agents but did not provide any answers to the more demanding issue—the dependency on the number of tasks.

#### Distributed Stochastic Algorithm (DSA)

DSA is a myopic, greedy, local search algorithm that employs a random parallel schedule, in which each agent will, with some probability, called the degree of parallel executions, change its action. The preferred action will be determined by the arg max decision rule. The motivation to implement such a schedule is to minimize the phenomenon known as "thrashing", where having all agents change their actions at the same time unintentionally leads to a suboptimal Nash equilibrium. However, a random parallel schedule cannot completely eliminate thrashing although it may be minimized. Furthermore, DSA almost surely converges to a Nash equilibrium in potential games due to the finite improvement property as there is a probability of moving towards the Nash equilibrium, which is an absorbing state, at every time step [24].

#### **3. Game Design**

We extend the model in our previous work [16]. To begin, it is necessary to determine a variable in the model that relates the agents' actions to time and the most obvious variable for consideration will be the task completion time, *t <sup>c</sup>*(*a*). If it is possible to influence the variable in such a way that the temporal constraints are reflected in the task completion time, then explicit alteration of the reward structure for the game model will not be required. (i.e., If the task completion time is determined in such a way that the temporal constraints are always satisfied, then the agent's reward will be reduced if the chosen action cannot provide a feasible solution since the task will be incomplete.) In essence, a task allocation problem with both spatial and temporal coupled constraints can be considered to be two sub-problems—an allocation and a scheduling problem. The game model for allocation penalizes agents for actions that violate spatial constraints, while the scheduling algorithm eliminates possible rewards for actions that violate temporal constraints.

To reflect the temporal relationships between the tasks, a temporal matrix T is introduced. This temporal matrix differ from the matrix in CCBBA [8] in that the entries in the matrix do not represent time value restrictions but instead describes the explicit relationships between tasks using a coded variable in Table 2 for simplicity purposes. This difference is trivial as the temporal matrix and the proposed algorithm can be adapted to consider time relationships similar to CCBBA if required.


**Table 2.** Code for Temporal matrix entry T*q*,*p*.

While CCBBA describes six types of temporal coupled constraints, only five types of constraints need to be considered, less the *between* coupled constraint. Recall that the *between* coupled-constraint states that task *A* must begin after task *B* ends and end before task *C* begins. Therefore, it is possible to decompose a *between* coupled constraint into a *before* and an *after* constraint as seen in Figure 1.

#### *3.1. Learnability of Ckj* (*a*)

When considering only spatial coupled constraints, the function *Ckj* (*a*) in Equation (3) is well-designed if it is learnable [1,28,29]. However, when considering games with both spatial and temporal constraints, additional restrictions on function *Ckj* (*a*) need to be imposed.

**Example 1.** *Consider that agent i selects a path ai* = {*k*1, *k*2} *such that some spatial constraints are violated by k*1*. Agent i's local utility can be easily computed as*

$$u\_i(\{k\_1, k\_2\}, a\_{-i}^\*) = -\mathbb{C}\_{k\_1}(a) + v\_{k\_2} + c \tag{8}$$

*where vk*<sup>2</sup> *and c are some positive rewards that agent i gained from attending k*2*. Now, assume that there exists a task k*<sup>3</sup> ∈ *a*<sup>∗</sup> <sup>−</sup>*<sup>i</sup> that <sup>k</sup>*<sup>2</sup> *is spatially dependent on. Furthermore, <sup>k</sup>*<sup>3</sup> *is also temporally dependent on <sup>k</sup>*<sup>2</sup> *such that if agent i considers ai* = *k*2*, the temporal constraints for k*<sup>3</sup> *cannot be satisfied and k*<sup>3</sup> *will be unassigned. In this way, agent i's local utility for ai* = *k*<sup>2</sup> *is*

$$
\mu\_i(k\_2, a\_{-i}^\*) = -\mathbb{C}\_{k\_2}(a). \tag{9}
$$

*The (possibly) only feasible solution is ai* = **0** *which leads to zero local utility for i. However, for some game settings (e.g., Ck*<sup>1</sup> (*a*) *is the number of spatial constraint violations for task k*<sup>1</sup> *as a result of collective action, a, and reward vk*<sup>2</sup> *is significantly large), it is possible that*

$$
u\_i(\{k\_1, k\_2\}, a\_{-i}^\*) \ge \boldsymbol{u}\_i(\mathbf{0}, a\_{-i}^\*) \ge \boldsymbol{u}\_i(k\_2, a\_{-i}^\*) \tag{10}$$

*and thus, path ai* = {*k*1, *k*2}*, which is infeasible, is the preferred action. For the game to converge to a feasible solution, it is imperative that*

$$
u\_i(\{k\_1, k\_2\}, a\_{-i}^\*) < u\_i(\mathbf{0}, a\_{-i}^\*) \tag{11}$$

$$-\mathcal{L}\_{k\_1}(a) + \upsilon\_{k\_2} + c < 0 \tag{12}$$

*or rather,*

$$-\sum\_{k\_{\bar{j}} \in \mathcal{K}\_{\text{unstrained}}} \mathbb{C}\_{k\_{\bar{j}}}(a) + \sum\_{k\_{\bar{i}} \in \mathcal{K}\_{\text{unventional}}} v\_{k\_{\bar{i}}} + \mathcal{c} < 0 \quad \forall k\_{\bar{i}}, k\_{\bar{j}} \in a\_{\bar{i}\nu} \tag{13}$$

*where* K*unconstrained are tasks assigned to agent i that do not violate any spatial constraints, and* K*constrained are tasks assigned to agent i that violate some spatial constraints.*

In other words, the magnitude of the penalty needs to be sufficiently large to ensure that agents will never prefer a path that is infeasible. A possible design for *Ckj* (*a*) which satisfies Equation (13) and is learnable can then be

$$\mathcal{C}\_{k\_{\!\!\!/}}(a) = G \times N\_{k\_{\!\!/}}(a) \tag{14}$$

where *G* is a gain, and *Nkj* (*a*) is the number of agents attending to task *kj* based on the collective action *a*. Any sufficiently large gain should satisfy the constraint.

#### *3.2. Scheduling Algorithm*

Before describing the scheduling algorithm, some terminologies that will be used in the discussions are first defined along with the assumptions that are made.

**Definition 4.** *(Allocation) A task is allocated if the task is in the path of any agent (i.e., kj is allocated if kj* ∈ *a).*

**Definition 5.** *(Assignment) A task is assigned if there exists a task completion time determined by the scheduling algorithm.*

The definitions essentially differentiate allocation and assignment such that

$$
\mathcal{M}\_{\text{assigred}} \subseteq \mathcal{M}\_{\text{allocated}} \subseteq \mathcal{M}.\tag{15}
$$

In other words, not all allocated tasks must be assigned. Allocated tasks that are incompatible with the temporal constraints will not be assigned. In this way, the scheduling algorithm can be thought of as an oracle which advises the agents on the suitability of their actions. All constraint violations are then determined based on assignments rather than allocations.

**Assumption 1**. *(Complete information game) Every agent has complete information with regards to the environment. They know the locations and details concerning all agents and tasks. This assumption, though demanding, is not overly restrictive. Firstly, in multi-agent systems, it is not unusual for an agent to understand the basic capabilities of other agents within the system, especially in cases where the agents are expected to cooperate. Secondly, the main research objective of this paper covers that of task allocation rather than path planning or mapping. Therefore, it is fair to assume that there is sufficient knowledge regarding the environment when assigning agents to tasks.*

**Assumption 2**. *(Optimistic agents) If an agent arrive at a task that it cannot complete due to insufficient information on the schedule resulting in the inability to determine a feasible service time window, then the agent will hold at this task for a user-specified time period before leaving. If sufficient information can be gathered during this hold period, then the agent will attempt to complete the task (i.e., determine a task completion time). In other words, agents are optimistic. In our application, this hold period is trivially considered in the form of number of iterations rather than true time units.*

The scheduling algorithm (Algorithm 1) is designed using a greedy approach—assignment is based on the earliest-first principle. Additionally, cooperation between agents to complete a single task is permitted. The duration of service for each task, and thus completion time, is then determined based on some function *fs* that is dependent on the number of agents and starting times.

To begin, the availability of each agent, which can be computed from the expected time of arrival at the targeted task in its path *ai*, needs to be determined. (i.e., The initial availability of an agent is first determined based on the expected time of arrival at the first task in its path *ai*, which is also the initial target. After leaving the first task, the agent targets the next task in the path and determine its availability based on the targeted task. Target transition and availability computation continue in such a manner until the agent reaches the final task in its path whereby having no target after leaving the final task, the availability is set to be infinity.) At each iteration, the agent with the earliest availability is selected (line 4) and the serviceability of the allocated task is determined by the temporal constraints that are enforced on the task. For example, if task *A* is to end before task *B* begins, then task *A* will only be serviceable if an expected start time for task *B* exists. In fact, it is for this particular reason that the second assumption is necessary since most tasks can be unserviceable in complicated scenarios if the agents are pessimistic. Additionally, it should be noted that for a task to enforce some temporal constraints, it has to be allocated (e.g., If task *C* must begin after task *D*, but task *D* is not allocated, then task *C* can be considered to be having no temporal constraints). Temporal constraints do not actively enforce allocations, but rather, consider that if dependent tasks are allocated, then their assignments must be constrained.

If the allocated task is unserviceable, then the agent is placed on hold and the next available agent is considered. Otherwise, a service time window can be computed, and the agent determines the expected completion time for the task and updates its availability based on the next task in its path. At any iteration, if any agent has been placed on hold at a task for longer than the specified hold period, then it leaves the current task and transits to the next task in its path (line 11 to 16) and will most probably never return to the skipped task. If an agent completes all its allocated task, then it will no longer be considered even if it has the earliest availability. The algorithm converges when all agents

can no longer proceed for any reason (e.g., all allocations have been assigned, all agents' allocated tasks are temporally constrained, etc.).

**Algorithm 1** Scheduling Algorithm: *schedule*

```
1: Input: a ← (ai, a−i)
2: Initialize: L = 1n, t
                     s = −1n, t
                              c = −1n, hold = 0n, Con(i, L, a, t
                                                             c, t
                                                               s) = {0, 1}
3: Sort:
4: i ← arg mini∈N (availability)
5: Constraint:
6: if Con(i, L, a, t
                s
                 , t
                  c) = 1 then
7: if holdi < maxHold then
8: holdi = holdi + 1
9: i ← arg mini∈N \{i}(availability)
10: goto Constraint
11: else
12: holdi = 0
13: Li = Li + 1
14: update(availability)
15: goto Sort
16: end if
17: Schedule:
18: else if Con(i, L, a, t
                    s
                     , t
                       c) = 0 then
19: L, t
         s
          , t
           c ← compute_schedule(i, L, a, t
                                        s
                                        , t
                                          s
                                           , fs)
20: Li = Li + 1
21: if not first then
22: L0 = L
23: L, t
            s
             , t
               c, hold, availability ← align(L, a, t
                                              s
                                               , t
                                                 c)
24: if L = L0 then
25: L, t
               s
                , t
                  c, hold, availability ← constraint(L, a, t
                                                      s
                                                       , t
                                                         c)
26: if ismember(L, Lhist) then
27: return
28: else
29: append(L, Lhist)
30: end if
31: end if
32: end if
33: goto Sort
34: end if
```
In the event that multiple agents have been allocated to the same task, then agents will only participate if the task has yet to be completed at their time of arrival. If an agent were to participate in servicing a task, then it will update the expected completion time of the task, its own availability and also the availability of all other (previous and current) participating agents to reflect the change based on *fs*. Immediately after updating the schedule, the algorithm will invoke an alignment procedure (Algorithm 2) and possibly a temporal constraint check procedure (Algorithm 3) before moving on to the next iteration.

#### **Algorithm 2** Alignment procedure: *align*

1: **for** all agents *i* participating in task *k* **do** 2: find *k* in *ai* 3: **if** index(*k* ∈ *ai*) + 1 = **L***<sup>i</sup>* **then** 4: **t** *s* , **t** *<sup>c</sup>*, hold*<sup>i</sup>* <sup>←</sup> *drop*(tasks after *<sup>k</sup>*) 5: **L***<sup>i</sup>* = index(*k* ∈ *ai*) + 1 6: **end if** 7: **end for** 8: *update*(availability)

#### **Algorithm 3** Constraint check procedure: *constraint*


The alignment procedure (Algorithm 1, Line 23) is crucial in this algorithm when multiple agents' availability are updated simultaneously to ensure that the agents' availability are in-line with their targeted tasks. The importance of this procedure is shown in the following example.

**Example 2.** *Consider an agent i with ordered path ai* = {*k*1, *k*2, *k*3}*. Agent i was previously expected to complete task k*<sup>1</sup> *and k*<sup>2</sup> *at time t*<sup>1</sup> *and t*2*, respectively and its availability to begin work at k*<sup>3</sup> *is expected to be t*3*. However, agent j now decides to participate in task k*<sup>1</sup> *and the expected completion time for task k*<sup>1</sup> *is brought forward to t*<sup>1</sup> − *w. Agent j then updates agent i's expected availability to be some time t*<sup>2</sup> − *w as j expects i to be at the next step in the path, k*2*, because it cannot easily predict i's availability any further than the next step. This leads to a loss of alignment between path step and expected availability and any further scheduling will only be incorrect.*

Therefore, an alignment invoked at task *k* is effectively a procedure to bring agents back to *k* such that the agents must reschedule for all tasks in their path after *k*. When an alignment procedure is invoked, temporal constraint violation checks (Algorithm 1, Line 25) are necessary as a result of dropping some tasks when aligning the agents. If no agents are realigned, then a temporal constraint violation check is not necessary. However, if a temporal constraint violation is found during the checking procedure, then tasks with violations (and all future tasks in the path) will be dropped. Further alignment for all agents (Algorithm 3, Line 9) is required before checking for temporal constraint violations again.

One major issue resulting from the alignment and temporal constraint check procedure is that cycling may occur. Cycling refers to the phenomenon when, as a result of removing some conflicting task schedules, the overall schedule reverts to a historic state. Since the scheduling process is deterministic, then the process is stuck in an infinite loop. Maintenance of a history on the scheduling outcome, such as the path step of every agent, **L**, whenever a constraint check procedure is invoked will help to identify cycling and if a cyclic state transition is observed (Algorithm 1, line 26 to 27), then the scheduling is considered to have converged as defined previously since it is infeasible to continue with the scheduling process. Please note that the scheduling algorithm is anytime (with respect to temporal constraints) as temporal constraint checks are considered at every iteration when necessary. Therefore, any schedule proposed by the algorithm at the end of an iteration is feasible, even when a cyclic state transition is present. The convergence of the algorithm when faced with cycling can be thought of as an early termination of the scheduling process which provides a feasible but incomplete schedule. Hence, the impact of cycling can be considered trivial as such phenomenon leads to low global and local utility due to "incompleteness" in the scheduling process where agents are unlikely to prefer such collective action. Generally, cycling tends to occur when the number of agents considered for the problem is too low when compared to the hold period and therefore, proper selection of hold period will minimize the occurrence of cycling.

**Theorem 1.** *Using the proposed game model and scheduling algorithm for a game with spatial and temporal coupled constraints, the game will always converge to a feasible solution where all the coupled constraints are satisfied.*

**Proof.** We prove by contradiction. Consider a game with both spatial and temporal coupled constraint which converged to an infeasible solution where agent *i* selected a path *a <sup>i</sup>* which contains *kj* that violates *y* spatial coupled constraints. We also know that there exists a null action which is always feasible. If *a*∗ <sup>−</sup>*<sup>i</sup>* violates *<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>+</sup> number of spatial coupled constraints, then

$$u^{k\_j}(a\_i', a\_{-i}^\*) = -G(\mathbf{x} + \mathbf{y}), \quad u^{k\_j}(0, a\_{-i}^\*) = -G\mathbf{x} \tag{16}$$

$$
\mu\_i^{k\_j}(a\_{i\prime}'a\_{-i}^\*) = -Gy < \mu\_i^{k\_j}(0, a\_{-i}^\*) = 0 \tag{17}
$$

$$
u\_i(a\_{i'}', a\_{-i}^\*) \preccurlyeq u\_i(0, a\_{-i}^\*) \tag{18}$$

$$a\_i^t \neq \arg\max\_{a\_i \in \{a\_i', 0\}} u\_i(a\_{i\prime} a\_{-i}^\*) \tag{19}$$

*a <sup>i</sup>* is not the argument which maximizes the local utility of agent *i* and thus, *a <sup>i</sup>* cannot be the preferred action at equilibrium.

Explicit consideration on temporal constraint violations in the proof is not necessary as the scheduling algorithm will always propose a temporally feasible schedule. Since violations in the solution are determined by the assignments rather than allocations, any task in any path that do not meet the temporal requirements will not be assigned. Therefore, it is only necessary to show that agents will never choose an action that violates spatial constraints given current observations to prove the feasibility of the solution.

#### **4. DSA with Sampling**

For equilibrium selection, the DSA algorithm was considered to be the basis for improvement. The DSA is preferred over other algorithms such as log-linear learning for its relatively low overhead, good solutions [2,24] and ease of decentralization. In theory, DSA uses a best-reply dynamic given a

complete action set and therefore, when considering a very large action set, DSA requires significant computational power and time to determine the best reply, making implementation impractical. Hence, by somehow placing a limit on the number of evaluations required, it is possible to implement DSA in a game with significant number of tasks.

One obvious way to do so will be to constrain the action set of every agent to reduce the computational load, where at each iteration, the action set is randomly sampled with a pre-defined sample size *s* (Algorithm 4, Line 4). It is necessary to further ensure that the null action, which is always feasible, is included in the constrained action set so that agents will always have a feasible action to consider. An agent will then choose to either maintain its action, or select an action in the constrained set using the best-reply dynamics. Marden [26] took a similar approach in his payoff-based implementation of log-linear learning albeit for different reasons, and consider only one randomly chosen action at every iteration, without the restriction of having a null action. In our implementation, the rate of convergence is expected to improve due to the reduction in number of evaluations required at every iteration. Recall that the motivation for using a parallel degree of execution in DSA is to introduce stochasticity into the model in hopes of escaping from a local optimum. This also means that the search path taken to select the Nash equilibrium is stochastic and in a problem with multiple Nash equilibria, the game can terminate at either equilibrium when played multiple times despite having similar levels of parallel degree of execution. Instead of using parallel executions, stochasticity is introduced into the equilibrium selection through having constrained action sets. Intuitively, DSA with sampling will still arrive at a Nash equilibrium as *t* → ∞. A basic implementation of the game-theoretic coupled-constraint task allocation is shown in Algorithm 4.

**Algorithm 4** Game-Theoretic Implementation

```
1: Input: N , A,s, tcon, t = 1, T = 1, aT=0 = 0
2: while t ≤ tcon do
3: for i ∈ N do
4: Ai,T = datasample(Ai,s)
5: assignmenti,T−1 = schedule(ai,T−1, a−i,T−1)
6: ui,T−1 = utility(assignmenti,T−1)
7: mμ = 0, bestActioni,T = ai,T−1
8: for ai,T ∈ Ai,T do
9: assignmenti,T = schedule(ai,T, a−i,T−1)
10: ui,T = utility(assignmenti,T)
11: if ui,T − ui,T−1 > mμ then
12: mμ = ui,T − ui,T−1
13: bestActioni,T = ai,T
14: end if
15: end for
16: ai,T = bestActioni,T
17: end for
18: if ai,T = ai,T−1 ∀i ∈ N then
19: t = t + 1
20: else
21: t = 0
22: end if
23: T = T + 1
24: end while
```
The proposed equilibrium selection algorithm should have a lower dependency on the number of tasks and the rate of convergence is affect by two main parameters—the number of confirmations *t*con (Algorithm 4, Line 2) and the sample size *s* (Algorithm 4, Line 4). Therefore, by carefully varying the number of confirmations and sample size, the equilibrium selection will be capable of satisfying the time requirements of the game.

#### **5. Results and Discussion**

We assess the game design and equilibrium selection algorithm in a simulated disaster relief operation in a 10-by-10 grid world. In this operation, there are three types of mission-specific autonomous vehicles, each with varying response capabilities, to be allocated to different types of disaster situations. The engineering vehicles are capable of clearing wreckage from collapsed structures while the rescue vehicles are required to extract casualties to a safe location and the firefighting vehicles are equipped to deal with fire outbreaks in the region. The agent-task compatibilities are depicted in Figure 2. Furthermore, some sites may be struck with more than one type of disaster which will then require agents to work together albeit with some constraints.

**Figure 2.** Agent-Task compatibilities in a disaster relief operation [30–34]. The figure is a composite image constructed from the various sources cited.

#### *5.1. Mission Coupled Constraints*


These constraints are only applicable when the tasks are in the same position (e.g., casualties trapped in a burning house.) The mission coupled constraints for these sites with multiple types of disaster can be reflected in the dependency matrix (Table 3) and temporal matrix (Table 4) as follows:


**Table 3.** Mission Dependency matrix.


*5.2. Feasibility of Game Model*

dep

Several agents and disaster sites at random locations of the grid world were considered. The summary of the disaster relief operations and agent parameters are provided in Table 5 with 20 different relief operations, each simulated as a game to be played 20 times each. The reward for the successful completion of each task was given as a time-decaying function

$$
\omega\_{k\_j}(a) = \nu\_{k\_j} \exp(-\lambda\_{k\_j} t\_{k\_j}^c(a)) \tag{20}
$$

where *νkj* is the intrinsic value of the task, and *λkj* is the discount factor for task *kj*, to reflect the urgency of the tasks and motivate the agents to work in an efficient manner.


**Table 5.** Disaster relief operation details and simulation parameters.

−−→ depicts a dependency relationship (i.e., a task of each type in the same location leading to a coupled constraint).

The feasibility of the game model using DSA with sampling can be inferred from the satisfaction of all coupled constraints in all the solutions obtained and the stochasticity of DSA is evident from the variation in global score values and computational times across the games for every operation scenario. For a centralized approach, the computational time for each game is below 1 min and when considering a synchronous decentralized approach, the estimated computational time falls to generally below 10 s. In a decentralized approach, each agent will run the scheduling algorithm, based on other agents' previous actions, independently. After evaluating the possible schedules, the agent will decide on its choice of action and communicate its decision to all other agents. Without explicitly considering the means of communication but assuming synchronous, the simulation times of a decentralized approach is estimated based on the slowest agent for each iteration and the overheads accumulated

in the communication process. For a synchronous decentralized approach, the model can provide a feasible solution quickly.

Figure 3 provides the graphical interpretation of a proposed schedule for one of the simulated games and information on the labels for the schedule are provided in Table 6.

**Figure 3.** Example of a feasible schedule.



The schedule in Figure 3 represents but one of many possible solutions to the simulated scenario. For a complex multi-agent multi-task allocation problem coupled with a flexible hold period, multiple Nash equilibria are likely to exist and the stochasticity of the agents' actions due to random sampling leads to variation in the route of progression for each game, and thereby terminating at various different solutions. Regardless of the progression, the feasibility of the scheduling algorithm to satisfy the temporal constraints are evident from the complementary behaviors portrayed in the solution (e.g., A *rescue*-type agent (*A4*) waits at a *casualty*-type task (*13*) for a short period so that the temporal constraint which require the *rescue* (*13*) to occur during *firefighting* (*12*) can be satisfied, *rescue*-type agent (*A4*) attempts a *casualty* rescue (*21*) task after the *wreckage* (*20*) have been cleared by an *engineer*-type agent (*A1*), etc.)

Moving beyond providing a feasible solution, the game should converge within a reasonably short time period to ensure the practically of a game-theoretic implementation in real-world operations and hence, further studies on the number of confirmations *t*con and sample size *s*, which are the main parameters influencing the rate of convergence, are presented in the subsequent sections.

#### *5.3. Number of Confirmations*

In this section, the effects of variation in number of confirmations on the global score and computation times are examined. Simulation of an operation setting similar to the example in Table 5 but with varying number of confirmations at 10, 50, 100, 500, and 1000, respectively, were considered and the results are shown in Figure 4 with the means and standard deviations for the various aspects of the simulation presented in Tables 7–9. The observed global scores are generally similar at the selected levels of confirmations and the standard deviation generally decreases with increasing number of confirmations, providing a tighter bound to the range of scores. Intuitively, the decrease in standard deviation is in relation to the increasing probabilities of the solution being a Nash equilibrium (refer to Appendix A).

With only 10 confirmations, the mean global score is recognizably lower, and the range is also obviously greater as compared to the global scores for games with higher levels of confirmation. This lower levels of performance can be attributed to the extremity of the constraints placed on the agents' action set. With a sample size of 20 and 10 confirmations, the agents are expected to be exposed to a maximum of 191 unique actions which barely covers approximately 5% of all possible actions. The exposure of an agent refers to the number of actions that it has seen from the moment when it last changed its action. As such, it is plausible that the agents do not have sufficient understanding of the full range of possible actions, leading to agreements that are less efficient. However, it should be noted that such inefficiencies lead to a global score which is merely 3% lower as seen in Figure 5. Indeed, it can be argued that agents do not need to understand the entirety of their action sets to arrive at reasonable solutions. With 50 confirmations, the maximum exposure is at 951 unique actions, or approximately 26% of all possible actions. Yet, mean global scores comparable to those at higher levels of exposure are observed. With 1000 confirmations, it is highly likely that every agent has seen its entire possible action set since there are only 3610 possible actions while it is exposed to 19,001 non-unique actions. Interestingly, the maximum scores achieved for each category are similar with a value of 259.862 except for simulations with only 10 confirmations having a slightly lower maximum of 259.791. This phenomenon will be amplified in dense task allocation settings since most tasks will be closely grouped together and most action sequences will provide similar levels of reward. As such, most Nash equilibria will provide near-optimal solutions and it is, therefore, unnecessary for agents to have a complete understanding of their possible actions since emphasis is not placed on achieving the solution with an optimal global score or the Nash equilibrium with the maximum global score. When constraints are provided at every iteration, agents have lesser numbers of choices to make at each iteration, allowing them to arrive at a general agreement more quickly and easily as evident in the short computation times for games with only 10 confirmations. Improvements to the agreement are then made as the agents continue to explore the possible action space for each iteration.

**Figure 4.** Variations in global scores and computation times due to number of confirmations.

However, it is necessary to note that increasing the confirmations do not necessarily lead to increase in global scores for every play. Recall that in DSA, progression is stochastic and hence the Nash equilibrium obtained for every play may differ and therefore, there is a non-zero probability of reaching a Nash equilibrium that has the lowest possible score despite considering a high number of confirmations in one play while having a Nash solution with a higher score when considering lower number of confirmations.

Another observation made, from Figure 6, is that the computation times increase with the number of confirmations as expected. In essence, a confirmation is defined as an iteration where all agents do not change their actions. This also means that in the new constrained action set of any agent at the current iteration, there are no actions that can improve any agent's local utility. For *t*con confirmations, all agents do not change their actions for *t*con consecutive iterations. When the agreement between the agents is not a Nash equilibrium, then the possibility of such an event occurring decreases with increasing values of *t*con. Therefore, as the number of confirmations increases, confidence on the solution being Nash increases, but at the expense of increasing computation time due to the additional evaluations made by the agents.


**Table 7.** Mean and standard deviation for global scores.

**Table 8.** Mean and standard deviation for centralized computation times.


**Table 9.** Mean and standard deviation for estimated decentralized computation times.


**Figure 5.** Mean global score values for various number of confirmations.

**Figure 6.** Mean computation times for various number of confirmations.

#### *5.4. Size of Samples*

To investigate the impact of variation in sample size, a dense allocation setting given in Table 10 is considered. The world remains as a 10-by-10 grid but the number of agents and tasks increased significantly and the sample sizes were varied to be 20, 40, 60, 80, and 100. The simulation results for the dense task allocation game are displayed in Figure 7, Tables 11–13.


**Table 10.** Crowded disaster relief operation details and simulation parameters.

Predictably, the mean global score values increase with increasing size of sample. The sample sizes of 20, 40, 60, 80, and 100 provided each agent with exposure to approximately 0.3%, 0.6%, 0.9%, 1.2% and 1.4% of its entire action set, respectively. Such low levels of understanding of the action set usually means that the agreements are suboptimal since the probability of making improvements to the existing agreement is rather high. Despite the suboptimality of the solutions, it should be noted that a 5-fold increase in sample size from 20 to 100 led to improvements in score by a mere 5% as seen in Figure 8. The logarithmic increment trend for the global scores further supports our belief that agents can make reasonably good scheduling decisions despite having an incomplete, and possibly small, understanding of their action set in a dense task allocation setting.

More importantly, from Figure 9, the computation times were observed to increase exponentially with the sample size. Without the use of sampling in the equilibrium selection, a game-theoretic approach for task allocation in a dense emergency operation will simply be impractical. Using random sampling, the computation times were suppressed to provide reasonably good solutions, enabling game-theoretic implementations in emergency operations which tend to be crowded, complex, and time-sensitive. In the game simulated, the mean computation times for both the centralized and decentralized approach were kept well below 60 min and 10 min, respectively, when the sample sizes are kept smaller than 60. Feasibly good solutions were also obtained quickly when considering a sample size of only 20 in the simulations.

**Figure 7.** Variations in global scores and computation times due to size of sample.

**Table 11.** Mean and standard deviation for global scores.



**Table 12.** Mean and standard deviation for centralized computation times.

**Table 13.** Mean and standard deviation for estimated decentralized computation times.


**Figure 8.** Mean global score values for various sample sizes.

**0HDQFRPSXWDWLRQWLPHVIRUYDULRXVVL]HRIVDPSOH**

**Figure 9.** Mean computation times for various sample sizes.

#### **6. Conclusions**

In this paper, we introduced a game-theoretic framework for allocation of tasks with spatial and temporal coupled constraints first seen in CCBBA. A game-theoretic modeling of such problems helps to overcome potential convergence issues faced in a market-based allocation model through leveraging on the properties of potential games. A well-designed game model alone can effectively overcome spatial relationships among tasks and when coupled with the scheduling algorithm, allows feasible assignment of tasks with both spatial and temporal dependencies. Additionally, existing equilibrium selection algorithms have commonly been restricted by the size of the problem, making implementation of game-theoretic theoretic task allocation impractical for large-scale problems. However, by using random sampling in DSA, it is possible to obtain a feasibly good solution within a short time, allowing game-theoretic implementations in emergency operations which frequently consist of large numbers of tasks with complex relationships and yet require quick allocation at the same time.

The greatest limitation for the proposed methodology lies in the inability to quantify or guarantee the optimality of any given solution, in terms of global allocation, despite its feasibility to provide an allocation solution that considers the spatial and temporal relationships between the different types of task. A plausible explanation may be due to the existence of multiple Nash equilibria and the probabilistic approach in DSA. Additional studies to limit the game model to have only a single Nash equilibrium or to consider deterministic equilibrium selection algorithms that are a capable of selecting the Nash equilibrium with the maximum global allocation scores will help to overcome this deficiency.

Despite the current flaws in the proposed methodology, reasonably good solutions can still be obtained quickly to tackle coupled-constraint task allocation problem in a crowded space.

**Author Contributions:** Conceptualization, M.C.L. and H.-L.C.; methodology, M.C.L.; software, M.C.L.; validation, M.C.L. and H.-L.C.; formal analysis, M.C.L.; investigation, M.C.L.; resources, H.-L.C.; data curation, M.C.L.; writing—original draft preparation, M.C.L.; writing—review and editing, M.C.L. and H.-L.C.; visualization, M.C.L.; supervision, H.-L.C.; project administration, H.-L.C.; funding acquisition, H.-L.C.

**Funding:** This work was supported in part by Information Communication Technology Research and Development program of Institute for Information & Communications Technology Planning and Evaluation (IITP) grant funded by the Korea government Ministry of Science and ICT (NO.R1711055271, Development of High Reliable Communications and Security Software for Various Unmanned Vehicles).

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

CCBBA Coupled-Constraint Consensus-Based Bundle Algorithm


#### **Appendix A**

A Nash equilibrium is defined as a state where all players cannot improve its own position through unilateral change in strategy and can be identified when all players stop changing their actions for *t* iterations. In theory, Nash can only be guaranteed as *t* → ∞. For some equilibrium selection algorithms, this may be the only means of identification and the solution obtained cannot be guaranteed to be a Nash equilibrium since implementation of *t* → ∞ is impossible. That said, it is still possible to qualify the Nash properties of a solution through probability.

As the game model proposed in this paper is an exact potential game, there exists at least one collective action *a*∗ that is the Nash equilibrium. Assuming that all players except *i* is at the Nash equilibrium, then the only possible reason that *ai*(*t*) = *a*<sup>∗</sup> *<sup>i</sup>* at iteration *t* is that *ai*(*t* − 1) = *a*<sup>∗</sup> *<sup>i</sup>* and *a*<sup>∗</sup> *<sup>i</sup>* is not in the constrained action set for the current iteration since we know that the proposed equilibrium selection algorithm will either maintain its previous action, or select an action in the constrained action set that maximizes its local utility and *a*∗ *<sup>i</sup>* is the maximizer. Therefore, the probability that *ai* = *a*<sup>∗</sup> *<sup>i</sup>* is upper bounded by

$$\delta\_i \le \frac{(|\mathcal{A}\_i| - s)(|\mathcal{A}\_i| - 1)}{|\mathcal{A}\_i|^2} \tag{A1}$$

where A*<sup>i</sup>* is the complete action set for player *i* and *s* is the sample size. Given that sampling is independent for each player at each iteration, then if all players do not change their actions for *t*con iterations, the probability that the solution is a Nash equilibrium is at least

$$1 - \delta \ge \prod\_{i \in \mathcal{N}} 1 - \left(\frac{(|\mathcal{A}\_i| - s)(|\mathcal{A}\_i| - 1)}{|\mathcal{A}|^2}\right)^{t\_{\text{com}}}.\tag{A2}$$

Since (|A*i*|−*s*)(|A*i*|−1) |A*i*|<sup>2</sup> <sup>&</sup>lt; 1, then 1 <sup>−</sup> *<sup>δ</sup>* approaches to 1 as *<sup>t</sup>*con <sup>→</sup> <sup>∞</sup> and it is obvious that DSA with sampling will converge to a Nash equilibrium with probability 1 regardless of the size of the sample. Equation (A2) allows the qualification of the solution with regards to simulation parameters *s* and *t*con by providing a lower bound on the probability of a solution being Nash. While such a qualification does not reflect the optimality of the solution in terms of global score, it can be used as a guide for the selection of simulation parameters to balance between the confidence and speed of convergence as and when required. Increase in either the number of confirmations *t*con or sample size *s* will not only increase the confidence on the solution being a Nash equilibrium but also the computational times as seen in the results and discussion. Therefore, when emphasis is placed on either of the aspects—being

socially efficient or fast convergence, the simulations parameters can be modified accordingly to meet the objectives.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Woc-Bots: An Agent-Based Approach to Decision-Making**

#### **Sean Grimes and David E. Breen \***

Department of Computer Science, Drexel University, Philadelphia, PA 19104, USA; spg63@drexel.edu **\*** Correspondence: david@cs.drexel.edu

Received: 30 September 2019; Accepted: 29 October 2019; Published: 1 November 2019

**Abstract:** We present a flexible, robust approach to predictive decision-making using simple, modular agents (WoC-Bots) that interact with each other socially and share information about the features they are trained on. Our agents form a knowledge-diverse crowd, allowing us to use Wisdom of the Crowd (WoC) theories to aggregate their opinions and come to a collective conclusion. Compared to traditional multi-layer perceptron (MLP) networks, WoC-Bots can be trained more quickly, more easily incorporate new features, and make it easier to determine why the network gives the prediction that it does. We compare our predictive accuracy with MLP networks to show that WoC-Bots can attain similar results when predicting the box office success of Hollywood movies, while requiring significantly less training time.

**Keywords:** classification; prediction; multi-agent; wisdom-of-crowds; Hollywood; feature-extension; collective-intelligence; swarm

#### **1. Introduction**

We, humans, want to predict the future; disease outbreak and risk factors, business success, economics, and many more applications can benefit from better forecasting. Researchers have developed many tools to help us make predictions, with artificial neural networks (ANNs) being a current popular choice. ANNs can be used for classification, allowing us to take, for example, a series of features about an upcoming movie and determine, with fairly high accuracy, if the movie will be successful. ANNs, however, typically require a large amount of training data and compute time, and they do not generalize well to other topics. We cannot use an ANN trained on Hollywood movies to help us determine if some sports team will win an upcoming game or where the next 'hot spot' in an epidemic will be; they are inherently inflexible. Recent efforts are improving their flexibility by adding to their basic design, as seen in transfer learning [1], however this increases complexity and compute/data requirements while further obfuscating the internal workings of an ANN, making it *even more* difficult to answer the "why" about some outputted classification [2].

Prediction markets (PM) are designed to determine the probability of a future event taking place. Well-designed PMs encourage agents, human or computer-based, to contribute information to the market through trading shares and incentivizing correct, truthful information sharing, and then aggregate the information from individual agents into a collective knowledge [3]. PMs work because the aggregate knowledge of the group will generally be more precise and complete than the knowledge that any individual within the group holds. However, participants are expected to be well-informed in the topic being predicted, which as of current technology, requires human participants [4]. Additionally, computer agent-based PMs are difficult and programmer-intensive to create. Othman said on computer-based agents, "agent-based modeling of the real world is necessarily dubious. Attempting to model the rich tapestry of human behavior within economic structures—both the outstandingly bad and the terrifically complex—is a futile task [5]." Even if it were possible to model the complexity of

human knowledge and decision-making within some narrow topic it would be extremely difficult to generalize across topics. We can consider simpler agents that don't attempt to model human interaction. These agents have been studied in simple, academic, scenarios with some success when their behavior is limited in possible actions and their opponents are not adversarial [6,7]. However, work done by Othman and Sandholm [8] has shown that simply changing the order in which the agents participate in the market can drastically impact the outcome of the market, indicating that "markets may fail to do any meaningful belief aggregation."

An alternative to PMs is Wisdom of the Crowd (WoC). WoC takes the approach that the opinion of a large, diverse group will be more accurate than any individual opinion within the group given a sufficiently competent aggregation mechanism [9]. The classic example that demonstrates this is guessing how many jelly beans are in a jar at a county fair. Typically no individual is consistently able to get close to the correct amount, but the aggregate opinion of the group is generally very close to the correct number of jelly beans. WoC doesn't expect or require expert knowledge, Scott Page said "the squared error of the collective prediction equals the average squared error minus the predictive diversity" [10]. This means the more diverse the crowd, the smaller the predictive error.

In this paper we present a robust, computer-agent-based approach to making predictions about the success of Hollywood movies that can be easily distributed across multiple computational nodes [11]. We take a WoC approach, using simple agents (WoC-Bots) without expert knowledge that are trained with different, small, subsets of features that describe the movies. This initially gives us a group of agents with a diverse and independent set of knowledge. The agents interact with one another socially, sharing some knowledge, determining the trust they have in other agents and the confidence they have in their own opinion, and changing their opinion given enough evidence. Following this interaction an overall conclusion is drawn from the crowd using a trust and performance-based aggregation mechanism. Our system was compared with traditional multilayer perceptron (MLP) networks trained with the full set of features available to the agents, as well as a subset of the most highly correlated features. We show that WoC-Bots are able to achieve more accurate classification results, with reduced training time and resistance to feature drop-out.

#### **2. Methods & Design**

The test scenario for this research involves predicting if a movie will be a success. Success was defined as the reported revenue being greater than 2× the reported budget for the movie. It is difficult to determine exactly what revenue is considered a success, and it differs on a movie-by-movie basis. However, advertising and promotion budgets are generally less than the production budget, which indicates studios should start to see some positive cash flow if a movie makes 2× the production budget [12]. Additionally, defining success as we did split our data, discussed more in Section 2.1, roughly equally between success (47.5%) and failure (52.5%).

#### *2.1. Datasets & Libraries*

We primarily used two datasets for this work:


The MovieLens dataset provides information for more than 27,000 movies, while the TMDb dataset includes 5000 movies. Only movies found in both datasets, with complete information for all features used for classification, were considered. The features used for classification are listed in Table 1. A note about the genre feature; only the first two listed genres were considered for each movie in classifiers that used the genre feature(s). Each genre was assigned a unique numeric ID. We were left with 4722 possible movies for testing and training, however 1023 movies appeared to contain incorrect information, e.g., negative values for movie budget or movie revenue; these movies were removed from our testing and training subsets. We used both datasets to help reduce sparse

areas in the data for less popular and older movies. The data was split into two subsets, testing and training, with 2959 randomly selected examples used for training and 740 examples used for testing. The training subset was randomly selected from the full dataset at the start of each simulation, with the remaining examples being used for testing.


**Table 1.** Features available for classification.

The data was transformed in the following ways:


Sample training and testing CSV files encompassing the transformed data can be found at the following url https://data.mendeley.com/datasets/gj66mt4s4j/2, while the code required to reproduce the results presented in this article can be found at https://github.com/spg63/ MDPIApplSciCodeRepo. The code will be made available upon request to Sean Grimes or David E. Breen. Eclipse Deeplearning4j (DL4J) (https://deeplearning4j.org/) [14] is "an open-source, distributed deep-learning project in Java and Scala spearheaded by the people at Skymind (https://skymind.ai/), a San Francisco-based business intelligence and enterprise software firm." DL4J (versions 0.9.1 and 1.0.0-beta3) was used as a neural network library, providing the core multilayer perceptron classifier used by each agent (discussed in more detail in Section 2.2.1). Additionally, DL4J was used to build and test the larger MLP classifiers that were compared with our agent-based approach. All non-DL4J library code was written in Kotlin (https://kotlinlang.org/) (versions 1.3.20 - 1.3.41), running on the Java Virtual Machine (JVM) (https://www.java.com/en/download/) (versions 1.8.0\_151 - 1.8.0\_211). All feature, agent history data, and trained agents were stored in various databases, using SQLite3 (https://www.sqlite.org/index.html) as the database engine.

#### *2.2. Agent Design*

Agents are designed to be modular, presenting an interface that includes different algorithms for all aspects of their behavior. Agents are responsible for coordinating and managing the following functions:


#### 2.2.1. Classifier

Each agent contains a very small, very simple MLP classifier. All agents were configured with similar classifiers, each containing 2–4 input nodes (one for each feature), a single hidden layer containing numInputNodes \* 2 number of nodes, and an output layer with two output nodes, one for each of the two output classes, "success" and "failure". All classifiers used the DL4J implementation of the Adam updater [15] (learning rate), softmax activation function [16], and traditional stochastic gradient descent for optimization.

Each agent's classifier was given a single feature in common with all other agents, the movie budget. Other features were spread across multiple agents, occasionally in pairs (e.g., budget & vote\_count & vote\_average), but more frequently a single feature in addition to budget. Classification performance was the determining characteristic in assigning features to agents, with very low (<50% accuracy) performing combinations dropped during early testing in favor of decreased computational complexity.

#### 2.2.2. Agent Initialization & Movement

All agents currently participate and interact within a centralized 'interaction arena' (Arena) managed by a central controller responsible for registering agents, confirming that their initial location is valid, and validating each movement. All agents must be initialized within the bounds of the arena and into an empty space. All movements must be within the bounds of the arena and there can be at most two agents occupying any space within the arena. Interactions between three or more agents at the same location are not currently supported. Centralized control is currently being used to ease implementation, however it is not a requirement. Agents are capable of validating location and movement on their own, or if required, having some number *n* other agents confirm all positioning as valid in a decentralized manner. The Arena can take any 3D shape comprised of rectangles allowing for arrangements from a simple 4 × 4 square to something more complex, with multiple rooms, floors, and restricted movement between each, e.g., a simulated building.

Agents are currently responsible for maintaining a history of their movements within the arena, a history of interactions, and a history of how each interaction affected their internal belief. Historical information for each agent is stored in memory during each iteration and dumped to individual SQLite tables for long-term storage.

Agents are initialized with an InitializationAlgorithm and a MovementAlgorithm which implement simple interfaces, init() and move(), respectively. init() has a single goal, initialize the agent in the arena in an empty space. Initialization can be random, or account for complexities like location of other agents, placing agents with similar features close together, localizing similar information, or spreading them out to facilitate transmission of information between dissimilar agents. Localizing similar information may allow a group of similar agents to come to an optimal conclusion, whereas spreading similar agents out may allow the best performing agents to convince others of their

opinions [17]. The InitializationAlgorithm used in this work randomly initialized agents within a rectangular arena.

move() also has a simple goal, move the agent within the arena. move() can be as simple or complex as necessary; randomly selecting a space within the arena and 'teleport' the agent to the new space, or it can require the agent to move towards some target location or another agent. Agents interact when two agents move onto the same space. The MovementAlgorithm used for this work randomly moved agents in a "Manhattan-like" fashion, allowing each agent to move one step north, south, east, or west, within the bounds of the arena.

#### 2.2.3. Interaction & Scoring

The InteractionAlgorithm and ScoringAlgorithm are both designed to be modular, with the InteractionAlgorithm being responsible for deciding with whom an agent should interact, truthfulness, and trust updating. The InteractionAlgorithm is required to implement three functions, shouldInteract() which determines if the agent is interested in interacting with another agent, truth() which determines if the agent should be truthful with another agent, and updateTrust() which tries to update the other agent's trust score. updateTrust() is allowed to be a NO-OP function when it is not desirable to update other agents' trust scores. This work assumes all interactions are acceptable, doesn't limit repeat interactions, and requires all interactions to be truthful.

The ScoringAlgorithm determines how interactions update an agent's internal belief state. Agents are initialized with specific internal values, referenced in Table 2, that are (in part) updated during each interaction. Many of these values are made available to other agents during interaction, allowing each agent to determine how certain another agent is, what that agent's initial classification values were, and how much influence it will allow the agent to have over its current belief.


**Table 2.** Internal scoring variables.

Similar to the previous algorithm interfaces, the ScoringAlgorithm used by each agent allows the scoring to be implemented in a way most appropriate to the given problem. The algorithm is required to implement a single function, updatePrediction which updates the current binary prediction based on information from the most recent interaction. The ScoringAlgorithm used in this study works as follows: initially the agent (agent *a*) determines how willing it is to accept information from another agent (agent *b*), this is a function of *a*'s current certainty, where *acertainty* represents *a*'s current certainty and *aacceptance* represents *a*'s willingness to accept information from *b*

$$a\_{acceptance} = 1.0 - a\_{certainty} \tag{1}$$

Agent *a* then determines how much influence *b* should have (*binfluence*).

$$b\_{influence} = b\_{confidence} \* a\_{acceptance} \* b\_{trust\text{Certainity}} \tag{2}$$

where *bconfidence* represents *b*'s confidence and *btrustCertainty* represents *b*'s trust\_score ∗ certainty. *b*'s influence is modified based on its prior performance where *bpriorPer f* represents *b*'s prior performance, a value between 0.7 and 1.3, as noted in Table 2, and *bcorrectedInf luence* represents this modified value,

$$b\_{correctedInfluence} = b\_{priorPerf} \* b\_{influence} \tag{3}$$

*bcorrectedInf luence* is multiplied by -1 if *b*'s opinion (success or failure) differs from *a*'s opinion. *a*'s new certainty, *acertainty* is now calculated by Equation (4), where *a*'s certainty is increased if both *a* and *b* have the same belief and is diminished if they disagree,

$$a\_{certainty} = a\_{certainty} + b\_{correctedInfluence} \tag{4}$$

*a* now checks if it should flip its opinion, which it does if *acertainty* is less than 0.50. Finally, *a* updates its certainty if its opinion changed,

$$a\_{crt\,inity} = 1.0 - a\_{crt\,inity}.\tag{5}$$

#### *2.3. Opinion Aggregation*

Effective opinion aggregation is an open question with many different possible approaches [18]. This research hopes to contribute more to this area in the future. We implement a voting system, where each agent receives a maximum of 100 possible votes for their preferred outcome, success or failure. We considered three methods of vote aggregation. The first and simplest method gives equal weight to each agent regardless of performance, the Unweighted Mean Model (UWM) [19]. The second method gives each agent votes based on prior accuracy, where an 80% accuracy rate would result in 80 votes, similar to the Weighted Voter Model presented in [20]. Agents are initially allowed 50 votes each until an accuracy for prior performance can be determined. The third method we used is similar to the second, however it also takes into account the trust score that other agents are allowed to modify, giving more granular control over how much influence an agent has on the aggregate opinion. Total votes for agent *a* is represented by *atotalVotes*, where *apriorAccuracy* represents *a*'s prior accuracy and *atrust* represents *a*'s trust score,

$$a\_{total\,Votes} = \left( \left( a\_{prior\,Accuracy} + a\_{trust} \right) / 2 \right) \* 100. \tag{6}$$

During an interaction the agent, *a*, is allowed to modify another agent's, *b*, trust score (*btrust*) based on how agent *b* has performed in the past, if agent *a* and *b* are in agreement (*doAgree*), and if prior information that *a* has received from *b* was correct. Agent *a* will check its interaction history, look for any interactions with *b* to determine what percent of interactions gave advice that was correct (*bpercCorrect*). If there are no prior interactions the trust score will not be modified. The trust score can be modified a maximum of 5% during each interaction.

$$b\_{\text{trust}} = b\_{\text{trust}} + 0.05 \left( b\_{\text{percCorrect}} \* b\_{\text{priorPerf}} \right) \* do \, A \, \text{gree} \tag{7}$$

A high-level overview of the training, interaction, and voting process can be seen in Figure 1. The internal MLP for each agent is trained using available training data, the agents are presented with a binary question, they are initialized in an arena where they move and interact for some number of steps (based on time or total interactions). Agents are then assigned some number of votes based on the system described in Section 2.3 and they vote at the end of the interaction period.

**Figure 1.** Agent training, initialization, movement, interaction, and voting.

#### **3. Results**

We compared the results from our social, agent-based approach to the results produced by multiple configurations of a traditional, monolithic MLP developed in DL4J. All agent classifiers were trained on 4× Nvidia GTX 1070 GPUs using Cuda (versions 10.0–10.1 update 2) through the DL4J library. Table 3 shows the configuration and accuracy under different training conditions for each of the 10 agents. All agent classifiers were trained in parallel, taking an average of 2.8 s for 5 epochs and 22 s for 50 epochs; there was no accuracy improvement beyond 50 epochs. Once trained, the agents can participate in any decision-making configuration without re-training their classifiers. The best performing agent classifier was the budget, vote\_count, vote\_average agent, with the most important feature being budget, followed by vote\_count. The worst performing agent was the budget, runtime agent.


**Table 3.** Agent Classifier Accuracy for 5 and 50 epochs.

We tested 10 MLP networks, with a variety of feature sets, and with one containing the final revenue. Figure 2a shows the accuracy for five classifier configurations when trained for 5 and 50 epochs. The figure shows the change in classifier accuracy when various features have been removed as inputs into the network.

(**a**) Accuracy for single hidden layer MLP classifier. The y-axis shows which feature were included.

(**b**) Accuracy for agent interaction. The y-axis shows features included. **Figure 2.** Comparison of MLP and Woc-Bots performance.

All MLP classifiers were trained individually with an average training time of 2.6 s for 5 epochs and 21.2 s for 50 epochs. Comparing training times with the social agents, training 10 MLP classifiers in parallel took 1 min and 1 s (3 min and 32 s if computed sequentially) vs. 22 s to train 10 agents for 50 epochs. It should be noted that inference is slower using WoC-Bots; it takes an average of 260 milliseconds to test 740 examples using an MLP classifier incorporating all the features listed in Table 1, while it takes the WoC-Bots, encompassing the same feature set, an average of 13.4 s to test the same 740 examples. But, once trained, the agents can be reconfigured to compute new prediction results for different feature sets, without requiring retraining, unlike a monolithic MLP.

Accuracy results from five configurations of our social, agent-based prediction system can be found in Figure 2b. We show results for three aggregation mechanisms after 50 epochs of training: (1) unweighted equal voting, (2) votes assigned based on initial classifier performance, and (3) votes assigned based on classifier performance and agent trust (described in Section 2.3), with method (3) consistently out-performing methods (1) and (2). Method (1) is represented by the blue bar, method (2) by the orange bar, and method (3) the grey bar. We tested similar configurations across our agents and MLP networks. Data from Movielens and TMDb was combined, as described in Section 2.1, with no agent receiving information from only one source.

Similar to Figure 2a, Figure 2b's labels show which features were included in the interaction. Feature distribution across agents was optimized for accuracy, within the limits of available features. Five agents participated in each interaction, with the budget, vote\_average and budget, vote\_count interactions being comprised of five copies of the same agent.

WoC-Bots out-performed the MLP classifier in all cases except where final revenue was included as a feature, indicating that our aggregation method does not give enough weight to an agent with exceptionally good performance. We tested removing a highly correlated (http://ibomalkoc.com/ movies-dataset/) feature, vote\_count, which caused a performance decline in both the MLP and social agents, with the MLP network accuracy declining 4% compared to a decline of 1.9% in WoC-Bots, indicating our agents are more resistant to feature drop-out. We also tested removing an unimportant feature, runtime which showed a 1.7% performance increase in the MLP network and only a 0.3% increase for WoC-Bots, indicating poorly performing agents have little impact on other agents during the interaction period and receive few votes during opinion aggregation. Statistical analysis confirms that runtime is not highly correlated in both the TMDb dataset and an ensemble dataset combining the Movielens and TMDb data, as used in this article [21,22].

Figure 3 shows the performance of MLP networks and WoC-Bots as features are systematically added. The results presented in this figure are produced via the classifier performance & trust aggregation mechanism. The agents are configured to allow for maximum agent participation without duplicating agents in any simulation testing more than two features. Four copies of an agent, representing budget and vote\_count, participated in the first simulation. Four agents participated in the budget, vote\_count, popularity simulation, eleven agents participated in each of the following four-feature simulation. Twenty-six agents participated in the budget, vote\_count, vote\_average, runtime, popularity simulation with one agent receiving five features, five agents receiving four features, 10 agents receiving three features, and 10 agents receiving two features.

**Figure 3.** Accuracy of MLP vs WoC-Bots w/Max agent configuration while adding features.

In five out of six simulations WoC-Bots out-performed the MLP network, and significantly out-performed the MLP network when the most important feature, budget, was removed. WoC-Bots performed best when all features were available, and when the maximum number of unique agents were participating in the simulation. The MLP network performed best when the three most highly correlated features were the only features being considered. This performance difference indicates WoC-Bots are able to gather additional information from features that are less correlated with revenue without a net negative impact to their accuracy from the additional feature noise.

Figure 4 shows the accuracy for training epochs 1–50 for an MLP network and WoC-Bots. The network was configured with five features, budget, vote\_count, vote\_average, runtime, and popularity. Five agents participated in the simulation with four agents receiving two features and one agent receiving five features. Each two-featured agent received budget as a feature and one other feature from the list of features. No agent was duplicated. We choose this agent and feature configuration to make as fair and direct comparison with an MLP network as possible despite WoC-Bots performing better when more agents participate in the simulation, as seen in the budget, vote\_count, vote\_average, runtime, popularity simulation in Figure 3 where 26 agents were allowed to participate. WoC-Bots are out-performed in this configuration, slightly, by the MLP network when trained for more than 40 epochs, however they are able to more quickly integrate information compared to the MLP network, reaching an optimum (76.3%) at 20 epochs vs. the MLP optimum (76.8%) at 40 epochs.

**Figure 4.** Accuracy of MLP and WoC-Bots for five features over 1–50 epochs.

#### **4. Discussion**

Our system can easily include new features by creating a new agent to represent those features, allowing the agent to be added to the next "interaction" without the need to re-train a network or employ complex, dynamically expandable networks found in [2] when incorporating new data. Additionally, this design allows us to quickly test the impact of removing features or testing various combinations of features without the time-consuming re-training step required when changing features in a full MLP network. This allows us to easily removing features, like runtime, to test how they impact the final prediction.

WoC performance, or a computer-agent-based version of it, depends on two attributes, a diverse and independent crowd and an aggregation mechanism that assigns appropriate weights to individuals within the crowd to reach the correct collective decision [23]. We also know from Othman [5] that it is not reasonable to develop a computer agent that accurately represents the intricate and diverse knowledge that humans have. Therefore, we need to find the right balance between independently thinking agents within a crowd and information sharing to better represent the diverse knowledge of human agents.

#### **5. Conclusions & Future Work**

We have demonstrated a robust, flexible alternative to traditional ANN methods for making predictions about specific future events. Our implementation takes ideas from prediction markets, wisdom of crowds, and multi-agent systems to use simple, modular agents in a social setting to answer binary questions. Our results show that we can attain similar results to that of a multilayer perceptron when classifying Hollywood movies, while requiring less training time and offering more flexibility and prediction options. Further, our system is robust, demonstrating only a 1.9% loss in accuracy when losing the vote\_count feature versus a 4% loss in accuracy when the same feature was removed from the MLP network.

We have three main areas to focus on for improvement in the future. Recent work on deep neural networks is starting to explain "why" we get certain output. However, there is still a long way to go before we have the ability to easily answer this question [24]. Our system offers a framework, using a multi-agent approach, that should allow us to answer "why" more easily; at the core of each agent is a *very* simple, single hidden layer MLP. Agents track all interactions, how those interactions affect their internal belief, and how they change the trust value of other agents. Given the state of the system during a simulation, the internal belief and trust scores of each agent, as well as each agent's interaction history, we can follow the history of each agent, starting with its initial belief post-classification, through each interaction, allowing us to see when and why an agent's belief changed (or stayed the same).

The two other areas we will address in future work are (1) the interaction, movement, and initialization algorithms, allowing us to change and optimize the distribution and flow of information and (2) the aggregation mechanism. We will use theories from swarm intelligence [25] to better aggregate the information that each agent possesses in a manner that better extracts information from the **correct** agents while limiting the impact that **incorrect** agents have on the collective opinion. Unanimous A.I. (https://unanimous.ai/) has a unique, swarm-based aggregation method that is capable of arriving at a collective answer. Unanimous A.I. maintains a "human-in-the-loop" approach [26], where their 'swarm' is comprised of humans, answering binary and non-binary questions by working together to move a virtual puck to the collective answer [27]. We prefer a computer-agent-based approach that allows for new agents to be created as needed to answer questions as they come up. Our future work will focus on implementing a swarm-based algorithm to produce an "emergent prediction" from a group of relatively simple, modular agents.

**Author Contributions:** Conceptualization, S.G.; methodology, S.G. and D.B.; software, S.G.; investigation, S.G. and D.B.; resources, S.G. and D.B.; data curation, S.G.; writing–original draft preparation, S.G.; writing–review and editing, S.G. and D.B.; supervision, D.B.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
