Artificial Punishment Signals for Guiding the Decision-Making Process of an Autonomous System

Cabrera-Paniagua, Daniel; Rubilar-Torrealba, Rolando; Castro, Nelson; Taverner, Joaquín

doi:10.3390/app14177595

Open AccessArticle

Artificial Punishment Signals for Guiding the Decision-Making Process of an Autonomous System

¹

Escuela de Ingeniería Informática, Universidad de Valparaíso, Valparaíso 2362905, Chile

²

Departamento de Industrias, Universidad Técnica Federico Santa María, Valparaíso 2090123, Chile

³

Escuela de Psicología, Universidad Arturo Prat, Victoria 4721189, Chile

⁴

Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València, 46022 València, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7595; https://doi.org/10.3390/app14177595

Submission received: 19 July 2024 / Revised: 20 August 2024 / Accepted: 26 August 2024 / Published: 28 August 2024

(This article belongs to the Special Issue Advances in Intelligent Information Systems and AI Applications)

Download

Browse Figures

Versions Notes

Abstract

Somatic markers have been evidenced as determinant factors in human behavior. In particular, the concepts of somatic reward and punishment have been related to the decision-making process; both reward and somatic punishment represent bodily states with positive or negative sensations, respectively. In this research work, we have designed a mechanism to generate artificial somatic punishments in an autonomous system. An autonomous system is understood as a system capable of performing autonomous behavior and decision making. We incorporated this mechanism within a decision model oriented to support decision making on stock markets. Our model focuses on using artificial somatic punishments as a tool to guide the decisions of an autonomous system. To validate our proposal, we defined an experimental scenario using official data from Standard & Poor’s 500 and the Dow Jones index, in which we evaluated the decisions made by the autonomous system based on artificial somatic punishments in a general investment process using 10,000 independent iterations. In the investment process, the autonomous system applied an active investment strategy combined with an artificial somatic index. The results show that this autonomous system presented a higher level of investment decision effectiveness, understood as the achievement of greater wealth over time, as measured by profitability, utility, and Sharpe Ratio indicators, relative to an industry benchmark.

Keywords:

artificial punishment signals; autonomous system; decision making; investment decision

1. Introduction

From a general interpretation, a somatic marker represents an automatic activation of bodily or physiological signals in the presence of stimuli internal or external to the individual. Somatic markers are acquired and developed through the experience of the individual [1]. Examples of the manifestation of a somatic marker are increased heart rate, sweating, muscle twitching, or abdominal pain [1]. It is important to note that these bodily reactions do not occur intentionally but rather as a reflection of the emergence of an emotion in the presence of an internal or external stimulus. When an adverse situation occurs for an individual, a somatic marker can reflect through physiological changes that something is wrong and that it is necessary to update the course of action or make a decision [1,2].

The formation of somatic markers is a process that continues throughout a person’s life, although it acquires special importance during the stages from childhood to adolescence. Likewise, the existence of rewards or punishments as a consequence of decision making impacts the profiling of each somatic marker. Both reward and somatic punishments represent bodily states with positive or negative sensations, respectively. Within a flow of uncertain decisions, these positive or negative sensations can configure signals that alert, anticipate, or suggest courses of action over others [1]. Whether due to an internal or external stimulus, an individual may experience a reward sensation (e.g., increase in a feeling of well-being) or a punishment sensation (e.g., increase feelings of displeasure, sadness, anger) [3]. These sensations are experienced in real time, that is, when the stimulus is presented and the individual faces it. Likewise, both the sensations experienced by the individual and the actions taken by itself are recorded within the experience and contribute to the formation of somatic markers.

Several studies have been carried out on the effect of somatic punishment on decision making [4,5,6]. Also, to a lesser extent, some other studies have explored ways to artificially implement somatic markers within decision systems [3,7,8]. However, to our knowledge, no proposals have been developed that specifically address the implementation of artificial somatic signals arranged as somatic punishments.

In this paper, we present a new mechanism for generating artificial somatic punishments in an autonomous system. Somatic responses and somatic punishments enhance risk sensitivity, conferring to the system the ability to adapt to dynamic environments. To do that, the present work addresses (1) the design of an artificial somatic punishment generation mechanism for an autonomous system, (2) the definition of a decision model incorporating such a mechanism, (3) the implementation of an evaluation scenario using official and public data from Standard & Poor’s 500 (S&P500) [9] and Dow Jones Index (DJI) [10], which correspond to a group of indices that represent the market value of the companies listed in each of these indices in a time-series format with a daily time unit, and (4) the evaluation of the proposal through the execution of 10,000 independent experiments yielding promising results.

The obtained results show a behavior close to what could be observed in real stock markets, as well as in other research works such as [7,11]. In this sense, a relevant novelty of the current research work lies in the design, implementation, and evaluation of a novel artificial mechanism to be incorporated in autonomous systems, which address to extend the capability and functionality of autonomous systems for domains where humans can delegate their own decision making.

2. Literature Review

In 1994, Damasio enunciated the somatic marker hypothesis [1], which states that decision making is affected by subtle homeostatic changes, originating in the body and underlying emotions, and it integrates the mind and body in this way. These changes can take the form of sweating, agitation, heart palpitations, abdominal pain, muscle tension, or others and represent signs or somatic markers in the face of situations of risk, whether real or imagined.

Recent research incorporates the use of somatic markers in the study of decision making under stress [12], under uncertainty [13,14,15], concerning the difficulty in identifying and describing emotions or alexithymia [16], in equity profiles and selfishness in financial decisions [17], and in the perception of happiness in social interactions [18]. In the literature, it is also possible to find studies that relate somatic markers to aspects such as moral judgement and decision making [19]. For example, in [20], the effect of somatic markers and emotional intelligence on risky decision making is studied. The influence of these markers on decisions about consumer brands in e-commerce has also been evaluated in [21]. Another interesting example can be found in [22], where somatic markers’ effects on adolescents’ decision-making processes are evidenced. Applied examples can also be found in the field of health, relating somatic markers to decisions regarding suicide [23] or addiction problems [15].

Reward and, particularly, punishment processes influence decision making through emotional responses [24,25,26,27]. Emotional events, expressed in the body, influence decision making through a circuit or “body loop” of feedback afferent to the brain, whose primary channel is the vagus nerve [28]. In the case of opioid users, these emotional reactions appear abnormal [29]. Likewise, it has been seen that the medial prefrontal cortex contributes to the decision-making process that involves the risk of punishment [30]. There is evidence supporting that several neurobiological mechanisms (including the amygdala, GABA, and monoamine neurotransmitters, dopamine, and corticostriatal circuits) influence punishment [31].

On the other hand, different aspects have been studied concerning the stock market and decision making, such as investor over-reaction [32]; optimism, volatility, and decision making [33]; climate risk and stock prices [34]; and relationships between emotions and market anomalies [35,36], to name a few. There are several studies devoted to analyzing the human factors’ influence on investment decision making, for example, personal characteristics [37] and psychological factors [38,39]. In a complementary sense, the idea of developing systems for investment decisions that artificially incorporate affective aspects is not recent [40,41,42,43,44,45,46].

In recent years, work has also been developed relating factors inherently related to human beings, such as personality or somatic markers, to model investment decisions in intelligent systems. For instance, an autonomous system based on artificial somatic markers was presented by [7]. Regarding personality, an artificial autonomous system to support investment decisions using a Big-Five modeling approach was presented in [47]. This model was extended in [11] to integrate personality traits and artificial somatic markers. However, the few known works that have suggested the implementation of artificial somatic markers in autonomous investment systems have not considered the somatic punishment approach. In this sense, the present research focuses its analysis on the implementation and evaluation of an artificial mechanism of somatic punishment, that is, the generation of a negative impact or consequence in an autonomous system when the result of its decisions deviates negatively from its objective.

3. Materials and Methods

3.1. Mathematical Formulations for the Autonomous System

In the stock market domain, profitability is understood as the variation in the price of the market index, determined according to Equation (1):

P r o f_{t} = \frac{S P 500_{t} - S P 500_{t - 1}}{S P 500_{t - 1}},

(1)

where

S P 500_{t}

is the value of the S&P500 index in period t, and

S P 500_{t - 1}

is the value of the SP500 index in period

t - 1

. In this sense, the change in the price of the index is understood as a percentage price variation.

For its part, financial risk corresponds to the measurement of the volatility of the financial market, measured as the observed variance in the n previous periods of the return of the S&P500 market index, according to Equation (2):

R i s k_{t} = \sum_{i = 1}^{n} \frac{{(P r o f_{t - n + i} - E [P r o f])}^{2}}{n - 1},

(2)

where

P r o f_{t}

corresponds to the return of the S&P500 index in period t measured in Equation (1), and

E_{t} [P r o f] = \sum_{i = 1}^{n} \frac{P r o f_{t - n + i}}{n}

corresponds to the average of the returns of the S&P500 index of the last n periods, which is determined for the specific application.

On the other hand, the accumulated wealth is defined according to Equation (3) (see [48]):

W e a l t h_{t} = W e a l t h_{t - 1} (1 + P r o f_{t}),

(3)

where

W e a l t h_{0} = I_{0}

, and

I 0

represents the initial wealth value that the autonomous system possesses at the beginning of the experimentation. The parameter

P r o f_{t}

is the same as Equation (1).

On the other hand, the utility of the autonomous system is defined as a function of its wealth, specifically a constant relative risk aversion function commonly used in the economic literature [49,50], and that allows the evaluation of the experimental results from an economic perspective. The functional form is defined below:

U t i l i t y_{t} = \frac{{(W e a l t h_{t})}^{1 - σ}}{1 - σ},

(4)

where

W e a l t h_{t}

corresponds to the wealth of the autonomous system in period t and is defined in Equation (3), and

σ

corresponds to a risk aversion parameter of the autonomous system, which usually uses values greater than 1 in economic literature. This way, this utility function permits to estimate the fluctuations of the agent’s well-being level over time, thus influencing the measurement of the agent’s performance.

Finally, we define a Sharpe ratio as a measure that relates the expected value to the standard deviation of the return on the financial asset [51], which allows us to have a standard against which to compare different asset selection strategies within a portfolio, which we define as

S h a r p e R a t i o = \frac{E [P r o f_{t}] - r_{f}}{S d [P r o f_{t}]}

(5)

where

E [P r o f_{t}]

corresponds to the expected value of the return on the financial asset;

r_{f}

corresponds to the risk-free rate of the economy which we can consider equal to zero for the experimentation; and

S t d [P r o f_{t}]

corresponds to the standard deviation of the financial asset.

3.2. Indexes for Risk, Loss, and Accumulated Loss

This section defines some risk indices that complement the essential decision metrics in the capital market. First, the risk index is expressed through the difference between the observed risk value and an upper-risk tolerance limit, according to Equation (6):

R I n d e x_{t} = \{\begin{matrix} (R i s k_{t} - T o l R i s k_{t}) ζ + R I n d e x_{t}, & if R i s k_{t} + ρ_{R, t} \geq T o l R i s k_{t} \\ R I n d e x_{t} & otherwise, \end{matrix}

(6)

where

R i s k_{t}

corresponds to the level of risk observed in period t according to Equation (2);

T o l R i s k_{t}

corresponds to the risk tolerance that the autonomous system has in period t;

ζ

corresponds to a sensitivity parameter of the autonomous system to the intensity of the risk which for this experimentation can take a neutral value equal to 1 and

ρ_{R, t}

corresponds to a random variable centered on zero that shows the variability perceived by the autonomous system against the risk observed in period t, which can take values according to the variability of the capital market. Meanwhile, the loss index is obtained from Equation (7):

L o s s I n d e x_{t} = \{\begin{matrix} (P r o f_{t} - T o l L o s s_{t}) ν + L o s s I n d e x_{t}, & if P r o f_{t} + ρ_{L, t} \geq T o l L o s s_{t} \\ L o s s I n d e x_{t} & otherwise, \end{matrix}

(7)

where

P r o f_{t}

corresponds to the profitability observed in period t according to Equation (1);

T o l L o s s_{t}

corresponds to the tolerance to loss that the autonomous system has in period t;

ν

corresponds to a sensitivity parameter of the autonomous system to the magnitude of the loss which for this experimentation can take a neutral value equal to 1, and

ρ_{L, t}

corresponds to a random variable centered on zero that shows the variability perceived by the autonomous system against a loss observed in period t, which can take values according to the profitability observed in the capital market for a given period of time. For its part, the index of accumulated loss in a period is determined according to (8):

C L o s s I n d e x_{t} = \{\begin{matrix} (P r o f_{t} - T o l C L o s s_{t}) η + C L o s s I n d e x_{t}, & if P r o f_{t} + ρ_{C, t} \geq T o l C L o s s_{t} \\ C L o s s I n d e x_{t} & otherwise, \end{matrix}

(8)

where

P r o f_{t}

corresponds to the profitability observed in period t according to Equation (1);

T o l C L o s s_{t}

corresponds to the tolerance of the autonomous system before the accumulated loss in period t;

η

corresponds to a sensitivity parameter of the autonomous system to the level of accumulated loss in a period of time which for this experimentation can take a neutral value equal to 1, and

ρ_{C, t}

corresponds to a random variable centered on zero that shows the variability perceived by the autonomous system against the accumulated loss in period t.

3.3. Somatic Index and Punishment Factor

The risk index, the loss index, and the accumulated loss index are considered factors that relate the evolution of financial assets to the artificial somatic reaction that the autonomous system can experience. In this sense, the somatic index (SIndex) is defined according to Equation (9):

S I n d e x_{t} = \sum_{i = 1}^{3} I N D E X_{i, t} W_{i} + S I n d e x_{t},

(9)

where the vector of elements

I N D E X_{i, t} = (R I n d e x_{t}, L o s s I n d e x_{t}, C L o s s I n d e x_{t})

and

W_{i}

corresponds to the weight of each of the elements of the vector

I N D E X_{i, t}

which for this research can take equivalent values, considering a neutral experimentation. It is important to mention that when

S I n d e x

reaches or exceeds the

M a x S I n d e x

threshold,

S I n d e x

is restored to its initial value. Likewise, the investment portfolio is completely reconfigured. The

R I n d e x

,

L o s s I n d e x

, and

C L o s s I n d e x

flags are also restored to their respective initial values. Associated with

S i n d e x_{t}

, a variable called

p u n i s h m e n t_{t}

is established, which is increased by 1 each time

S i n d e x_{t}

reaches or exceeds the

M a x S I n d e x

threshold, according to Equation (10). If the autonomous system experiences a number of periods T without

S I n d e x \geq M a x S I n d e x

, then

p u n i s h m e n t_{t} = 0

; that is,

p u n i s h m e n t_{t}

is restored to its original value:

p u n i s h m e n t_{t} = \{\begin{matrix} p u n i s h m e n t_{t} + 1, & if S I n d e x_{t} \geq M a x S I n d e x \\ p u n i s h m e n t_{t} & otherwise . \end{matrix}

(10)

The punishment that the autonomous system can generate is indirectly linked to the financial indicators. Furthermore, the punishment observed by the system provides stability criteria for the decision, since poor financial results have an impact on the somatic index, leading to a portfolio reconfiguration. This ensures that a poor financial decision made by the autonomous agent is not permanent.

3.4. Tolerance to Risk, Loss, and Accumulated Loss, and Performance Evaluation

Risk tolerance, loss tolerance, and cumulative loss tolerance are time-varying dimensions, according to Equations (11), (12), and (13), respectively:

T o l R i s k_{t} = \{\begin{matrix} α β^{p u n i s h m e n t_{t}} + C, & if S I n d e x_{t} \geq M a x S I n d e x \\ T o l R i s k_{t} & otherwise, \end{matrix}

(11)

T o l L o s s_{t} = \{\begin{matrix} γ φ^{p u n i s h m e n t_{t}}, & if S I n d e x_{t} \geq M a x S I n d e x \\ T o l L o s s_{t} & otherwise, \end{matrix}

(12)

T o l C u m L o s s_{t} = \{\begin{matrix} δ ω^{p u n i s h m e n t_{t}}, & if S I n d e x_{t} \geq M a x S I n d e x \\ T o l C u m L o s s_{t} & otherwise, \end{matrix}

(13)

where

α

,

γ

, and

δ

correspond to constants that define the initial tolerance of the autonomous system to risk, loss, and accumulated loss, respectively. On the other hand,

β^{s_{t}}

,

φ^{s_{t}}

, and

ω^{s_{t}}

correspond to a factor that represents the sensitivity of the autonomous system against the risk, loss, and accumulated loss, which vary as the autonomous system reaches the

M a x S I n d e x

threshold. All these parameters must be calibrated for each experimental scenario to ensure the stability of the system. For all periods t, the values of the factors

β^{s_{t}}

,

φ^{s_{t}}

, and

ω^{s_{t}}

are between

[0, 1]

.

To measure the performance of the described model, we can generate an indicator that can be compared with a benchmark. In this sense, we can use the economic literature to generate an evaluation of the level of utility, which we describe as follows:

T o t P e r f o r m a n c e_{T} = \frac{\int_{0}^{T} U t i l i t y_{A A, t} e^{- κ t} d t}{\int_{0}^{T} U t i l i t y_{B e n c h, t} e^{- κ t} d t} = \frac{T o t U t_{A A, T}}{T o t U t_{B e n c h, T}},

(14)

where

\int_{0}^{T} U t i l i t y_{A A, t} e^{- κ t} d t

and

\int_{0}^{T} U t i l i t y_{B e n c h, t} e^{- κ t} d t

correspond to the integrals that generate the utility of the autonomous system model and a benchmark between an initial period 0 and a period T, respectively. Note that the utility for any moment in time is multiplied by

e^{- κ t}

, where

κ

corresponds to the intertemporal discount rate which the autonomous system faces and is strongly related to the market interest rates existing at that moment in time. This approximation allows us to compare the outcome of the autonomous system with a passive alternative, which in turn allows us to find the best alternative calibration of the model, considering the time effects of an investment decision.

On the other hand, an algorithmic sequence of the investment decision-making process is presented below. A graphical representation of this algorithm is depicted by Figure 1. Meanwhile, Figure 2 shows a representation of the general process, including the following system components: Domain Manager, Market Analyzer, Index Analyzer, Tolerance Analyzer, and Somatic Engine.

Algorithm 1 General decision-making process for an autonomous system

Input: updated market data.

Output: an investment decision; updated tolerance level to risk, loss, and accumulated loss.

1:: Get market data
2:: Update market indicators: profitability and risk [using Equations (1) and (2)]
3:: Update accumulated wealth [using Equation (3)]
4:: Update utility [using Equation (4)]
5:: Determine $R I n d e x t$ [using Equation (6)]
6:: Determine $L o s s I n d e x t$ [using Equation (7)]
7:: Determine $C L o s s I n d e x t$ [using Equation (8)]
8:: Get somatic reaction [using Equation (9)]
9:: if $S I n d e x_{t} \geq M a x S I n d e x$ then
10:: $P u n i s h m e n t_{t} = P u n i s h m e n t_{t} + 1$
11:: Update (portfolio)
12:: else
13:: [Maintain $P u n i s h m e n t_{t}$ value]
14:: end if
15:: Determine risk tolerance [using Equation (11)]
16:: Determine loss tolerance [using Equation (12)]
17:: Determine [using Equation (13)]
18:: Inform to market (investment decision)

The algorithm requires obtaining updated market data. Based on the above, the profitability and risk indicators are updated, along with the accumulated wealth and profitability. Then, from the determination of indices, the somatic reaction is obtained (represented by the variable

S I n d e x

). If said somatic reaction is equal to or greater than the tolerance limit represented by the

M a x S I n d e x

variable, the punishment variable is increased, the investment portfolio is updated, and the

S I n d e x

variable is restored to its initial value.

The algorithmic contribution lies in the possibility of reflecting human reactions in autonomous systems, specifically the impact of market performance on the somatic reaction, and its consequent reflection in somatic punishment, which triggers an increase in the sensitivity of the autonomous system to risk, loss, and accumulated loss. In other words, the availability and regulation of somatic reactions and somatic punishments give the autonomous system the ability to adapt to the environment and its dynamic market conditions.

4. Experimental Scenario

4.1. Scenario Description

The experimental scenario is based on an autonomous system that invests from 1 January 2017 to 31 December 2021, with a daily monitoring of the profitability evolution of financial assets. A set of 28 stocks belonging to the Dow Jones Index was considered for configuring investment portfolios. Likewise, to contrast the results obtained by the autonomous system concerning an industry benchmark, data from the S&P500 index from 1 January 2017 to 31 December 2021 are considered with a daily monitoring of their profitability. Only with the purpose of generating the first investment portfolio, the autonomous system considers, as a reference, data belonging to the Dow Jones Index from 1 January 2016 to 31 December 2016. Finally, for the current experimental scenario, the value of the

M a x S I n d e x

variable was set to 1.

For this experimental scenario, we proceed to use a characterization of parameters imposed by the researchers, which in principle seems to have a behavior similar to that which could be observed in a human being. However, the parameters are arbitrary and the only aim of this characterization is to provide a stable scenario that can be compared with other specifications. The proper estimation of each of the parameters corresponds to a line of research that is not part of this study but can be included in future research, following Bayesian or frequentist statistical methodologies.

Table 1 shows the parameters used in the experimentation, sketching a precise scenario for analysis. These parameters are linked to the characteristics of the autonomous system, determining the effects of risk, losses, and cumulative losses indicators in the somatic index, predominantly based on the observed profitability in financial markets. With this in mind, the objective is to emulate, through an autonomous system, the effect that can be observed in real individuals when monitoring market profitability, which may trigger somatic reaction effects resulting in changes in the investment portfolio’s composition (see Equations (8)–(12)).

4.2. Experimental Results

The general investment process was performed 10,000 times independently to ensure the stability of the distribution function of the results. In both investment modalities (autonomous system and benchmark), USD 10,000 are available to begin the investment process. The obtained experimental results are shown in Table 2, representing the last observed value of the indicator for a specific year, given a sequence of daily returns monitored by the autonomous system. The column “Investment Metric” indicates the investment metrics considered in the present work: accumulated wealth, utility, and Sharpe ratio considering as financial assets the S&P500 index and the average of the evolution of the portfolio generated by the autonomous agent. For its part, the value for each investment metric available year by year corresponds to the final value that the said metric reached on the last business day of said year.

At the end of 2017, the benchmark outperforms the autonomous system in all investment metrics. At the beginning of 2018, the autonomous system recovered and outperformed the benchmark, then fell again and recovered again, reaching a better result than the benchmark in both investment metrics at the end of 2018. For its part, the superiority of the autonomous system during the years 2019 and 2020 is evident. Then, after the first half of the year 2021, the autonomous system begins to perform worse. At the end of 2021, the benchmark reaches and slightly exceeds the autonomous system in both investment metrics.

Figure 3 and Figure 4 show the behavior of accumulated wealth and utility, respectively. The observable shadow on the autonomous system curve represents the variability along the time of the results obtained. Figure 3 shows that the evolution of accumulated wealth associated with the autonomous system decreased in mid-2017, representing inefficient decision making when configuring the investment portfolio, which changes when reconfiguring the portfolio at the end of 2018. The greater degree of efficiency is reflected at the beginning of the year 2020 when the fall of the financial markets caused by the financial crisis of COVID-19 was contained by the autonomous system by timely reconfiguring the investment portfolios. Meanwhile, Figure 4 shows the evolution of the level of utility, which more clearly shows the softening of the utility of the autonomous system in the year 2020 (corresponding to the COVID crisis).

From Table 2 and Figure 4, we can observe how the performance metric of the cumulative utility level starts to improve over time, implying that the strategy proposed for the autonomous system provides greater stability when considering the full history of the investments made by the autonomous system.

On the other hand, Figure 5 shows a sample path that illustrates the behavior of

s I n d e x

associated with one of the 10,000 independent experimental runs. Figure 5 shows that at the end of 2018, the

s I n d e x

variable reaches and exceeds the threshold represented by the

M a x S I n d e x

variable, which triggers both a portfolio change and a restoration of the

s I n d e x

variable, which visibly decreases its current value by the following investment period. Subsequently, it is possible to observe that a new maximum occurred in the first quarter of 2020, followed by three more events during the same year. During the year 2021, an equivalent level of maximums was observed.

5. Discussion

Regarding the experimental scenario, it is important to distinguish the mode of operation of each investment option first. On the one hand, the autonomous system adopts an active investment strategy, which means that it modifies its investment portfolio according to how it perceives its internal tolerance levels for risk, loss, and accumulated loss. In contrast, the S&P500 benchmark option represents that the autonomous system adopts a passive investment strategy. This means that the autonomous system invests only once in the said index at the beginning of the experiment and does not change its decision throughout the experiment.

Regarding the experimental results, a better behavior of the performance of the autonomous system is generally observed in the active investment strategy over the passive strategy option (i.e., following the benchmark), both in accumulated wealth and in utility. That said, the third quarter of 2021 sees a drop in performance on the active investment strategy. From the first quarter of 2020 to the second quarter of 2021, the high frequency of the portfolio’s reconfiguration made it possible to maintain the values of the investment metrics positively. However, this high frequency of portfolio reconfiguration did not allow the autonomous system to fully adapt to the high volatility conditions in the markets derived from COVID-19.

This research uses standard criteria for calculating return and risk, all according to the modern Markowitz portfolio theory [52]. The risk, loss, and accumulated loss indices are considered in their calculation, as well as the tolerance level to risk, loss, and accumulated loss, respectively. In this sense, the relationship between risk tolerance and greater volatility could increase the risk index. For its part, the relationship between the tolerance level for loss and a capital loss could increase the loss index. Meanwhile, the interaction between the tolerance level to accumulated loss and a sustained occurrence of capital losses could increase the index of accumulated loss.

The previously mentioned indices are considered when calculating the somatic index, called

S I n d e x

. This index is intended to reflect, period by period, the effect of the decisions made by the autonomous system during the investment process. Every time the

S I n d e x

variable reaches or exceeds the threshold represented by the

M a x S I n d e x

variable in the autonomous system, an artificial somatic signal is generated, which induces an increase in the punishment variable. This variable seeks to represent bodily states with negative sensations in accordance with the somatic marker hypothesis. A higher frequency of increase in the punishment variable directly affects risk tolerance, loss tolerance, and accumulated loss tolerance, which can affect the

S I n d e x

variable. For this purpose, we have carried out 10,000 independent simulations in order to have a distribution of the results and to understand the expected value as the main indicator that determines the efficiency of the proposed algorithm in this research.

On the other hand, it is important to mention that the tolerance levels to risk, loss, and accumulated loss are conditioned by the history of the autonomous system, which is reflected in the variation of the risk, loss, and accumulated loss, respectively. In other words, the autonomous system lowers its risk tolerance when market volatility reaches or exceeds its current level of risk tolerance, lowers your loss tolerance when you experience a loss that exceeds your current level of principal loss, and lowers your cumulative loss tolerance when it exceeds your cumulative loss tolerance threshold.

Additionally, this research incorporated a new efficiency metric based on the computed utility or welfare of the autonomous system. This metric differs from those that are usually used in the market, such as the direct measurement of the result of the investment process (accumulated wealth), which allows us to think of the autonomous system as an entity that measures not only a specific result in time but also as an agent that has the history of the whole process as part of its measurement. This strategy makes it possible to seek alternatives that maximize utility, prioritizing the stability of the results in a scheme that tries to emulate the behavior that a human being might have when faced with investment decisions.

The design and incorporation of a mechanism to generate artificial somatic punishments for autonomous systems try to reflect negative sensations that could emerge in people when making decisions. These feelings of punishment can guide the decision-making process of an autonomous system. The technological potential of the above is the option of delegating human decision making to artificial devices, objects, or systems that can represent the interests of people and society in environments or decision environments of higher complexity.

In this research, we present a way of measuring the total performance of the investment strategy of the autonomous agent (Equation (14)), which measures the evolution of the utility levels generated over time by the autonomous agent compared to that which would be generated by a passive strategy based on investing in a stock market index. This way of measuring performance allows us to take into account the impact of the volatility of each strategy on the evolution of the price level, giving more importance to events close in time, which seeks to simulate the decision making that a human being would make. This metric allows us to incorporate information for long-term decision making, generally favoring strategies that smooth investment results over time, similar to other indicators such as the Sharpe ratio, but with the difference that the Sharpe ratio is a static measurement of financial results, unlike the proposal, which allows for a dynamic measurement.

The obtained results show a behavior close to what could be observed in real capital market investment contexts, as well as in other works, such as [7,11,42,47]. Nevertheless, the present research explores the behavior of an autonomous system with a built-in artificial somatic punishment mechanism, which allows us to make an advancement in the design and implementation of artificial autonomous systems to which humans can delegate their decision making.

To the best of our knowledge, it has not been possible to find other proposals that consider the implementation and evaluation of an artificial somatic punishment mechanism for an autonomous decision system. In this sense, and in order to have a first approximation of the advantages and limitations of such a mechanism, the present research work started its evaluation with a single set of parameters for such a mechanism. From this perspective, it is certainly an area for future research to explore the expansion of the initial set of parameters, for example, by following strategies such as Monte Carlo simulations.

6. Conclusions

This work presented the design of a mechanism for the generation of artificial somatic punishments for an autonomous system. This system was evaluated by considering an experimental scenario based on the use of official data from Standard & Poor’s 500 [9] and Dow Jones [10]. Comparatively, observing all the investment periods, it is possible to affirm that the autonomous system, under an active investment strategy and using an artificial somatic index, presents a higher level of effectiveness of the investment decision, also offering periods of sustained accumulation of wealth and utility. That said, the level of intensity and the degree of influence of artificial somatic punishment on the decision making of the autonomous system is an aspect that requires further tuning and calibration.

We also present a way to measure the overall performance of the autonomous agent’s investment strategy by measuring the evolution of the level of profit generated by the autonomous agent over time compared to that which would be generated by a passive strategy based on investing in a stock index.

Although the experimental results were obtained from 10,000 independent executions, a limitation of the current research work is that these results derive from a single configuration of parameters (presented in Table 1). Another limitation is that the somatic punishment variable called “punishment” is due to a single way of representing the punishment in the autonomous system. Other previous studies share limitations regarding the experimental quantity and the definition of a variable by considering a unique central mechanism. In this sense, the present research emphasizes the need for an artificial mechanism to reflect the effects of negative decision consequences on decision makers. In humans, negative sensations derived from decision making can directly influence the next decision-making process, allowing for evaluating, redirecting, or correcting one’s own decisions.

Different future lines of research can be derived from the results obtained in the present work. First, it would be interesting to study the effect of diversifying the decision strategies contained in the autonomous system. Secondly, developing new experiments to avoid or contain the processes of excessive over punishment by the autonomic system would be necessary. This could be carried out using an exploratory analysis in which a calibration of the punishment variable would be performed. This way, it would be possible to sensitize it and adjust it to different risk contexts. Another potential line of future research is to increase the modes of actualization of somatic punishments and study their effects on the autonomous system’s decision-making process, as well as to carry out a convergence analysis of the experimental scenarios as their complexity increases. Finally, examining how the proposed model behaves when applied to other highly complex decision scenarios, such as flexible passenger transportation, would be appropriate.

Author Contributions

Data curation, D.C.-P., R.R.-T. and J.T.; formal analysis D.C.-P., R.R.-T., N.C. and J.T.; investigation, D.C.-P., R.R.-T., N.C. and J.T.; methodology, D.C.-P. and R.R.-T.; writing—original draft, D.C.-P., R.R.-T., N.C. and J.T.; writing—review and editing, D.C.-P., R.R.-T., N.C. and J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by ANID Chile through FONDECYT INICIACION project no. 11190370, Generalitat Valenciana CIPROM/2021/077 and the Spanish Government by project ID TED2021-131295B-C32.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Damasio, A. Descartes’ error: Emotion, rationality and the human brain. N. Y. Putnam 1994, 352, 1061–1070. [Google Scholar]
Damasio, A. Self Comes to Mind: Constructing the Conscious Brain; Knopf Doubleday Publishing Group: New York, NY, USA, 2010. [Google Scholar]
Cabrera, D.; Cubillos, C.; Urra, E.; Mellado, R. Framework for incorporating artificial somatic markers in the decision-making of autonomous agents. Appl. Sci. 2020, 10, 7361. [Google Scholar] [CrossRef]
Bahník, Š.; Vranka, M.A. Experimental test of the effects of punishment probability and size on the decision to take a bribe. J. Behav. Exp. Econ. 2022, 97, 101813. [Google Scholar] [CrossRef]
Iribe-Burgos, F.A.; Cortes, P.M.; García-Hernández, J.P.; Sotelo-Tapia, C.; Hernández-González, M.; Guevara, M.A. Effect of reward and punishment on no-risk decision-making in young men: An EEG study. Brain Res. 2022, 1779, 147788. [Google Scholar] [CrossRef]
Suárez-Suárez, S.; Holguín, S.R.; Cadaveira, F.; Nobre, A.C.; Doallo, S. Punishment-related memory-guided attention: Neural dynamics of perceptual modulation. Cortex 2019, 115, 231–245. [Google Scholar] [CrossRef] [PubMed]
Cabrera-Paniagua, D.; Rubilar-Torrealba, R. Affective autonomous agents for supporting investment decision processes using artificial somatic reactions. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 677–696. [Google Scholar] [CrossRef]
Cominelli, L.; Mazzei, D.; Pieroni, M.; Zaraki, A.; Garofalo, R.; De Rossi, D. Damasio’s somatic marker for social robotics: Preliminary implementation and test. In Biomimetic and Biohybrid Systems, Proceedings of the 4th International Conference, Living Machines 2015, Barcelona, Spain, 28–31 July 2015; Proceedings 4; Springer: Berlin/Heidelberg, Germany, 2015; pp. 316–328. [Google Scholar]
Standard & Poor’s 500 Index. 2022. Available online: https://www.standardandpoors.com (accessed on 1 January 2024).
Dow Jones Index. 2022. Available online: https://www.dowjones.com (accessed on 1 January 2024).
Cabrera-Paniagua, D.; Rubilar-Torrealba, R. Adaptive intelligent autonomous system using artificial somatic markers and Big Five personality traits. Knowl.-Based Syst. 2022, 249, 108995. [Google Scholar] [CrossRef]
Yilmaz, S.; Kafadar, H. Decision-making under stress: Executive functions, analytical intelligence, somatic markers, and personality traits in young adults. Appl. Neuropsychol. Adult 2022, 1–15. [Google Scholar] [CrossRef]
Lees, T.; White, R.; Zhang, X.; Ram, N.; Gatzke-Kopp, L.M. Decision-making in uncertain contexts: The role of autonomic markers in resolving indecision. Int. J. Psychophysiol. 2022, 177, 220–229. [Google Scholar] [CrossRef]
Lin, C.H.; Huang, J.T.; Chiu, Y.C. Iowa Gambling Task, Somatic Marker Hypothesis, and Neuroeconomics: Rationality and Emotion in Decision Under Uncertainty. Front. Psychol. 2022, 13, 848603. [Google Scholar] [CrossRef]
Xu, F.; Huang, L. Electrophysiological measurement of emotion and somatic state affecting ambiguity decision: Evidences from SCRs, ERPs, and HR. Front. Psychol. 2020, 11, 899. [Google Scholar] [CrossRef]
Snellman, H. Mind, Body, and Choice: A Review of Alexithymia and the Somatic-Marker Hypothesis 2022. Available online: https://www.diva-portal.org/smash/get/diva2:1682603/FULLTEXT01.pdf (accessed on 1 January 2024).
Miraghaie, A.M.; Pouretemad, H.; Villa, A.E.; Mazaheri, M.A.; Khosrowabadi, R.; Lintas, A. Electrophysiological markers of fairness and selfishness revealed by a combination of dictator and ultimatum games. Front. Syst. Neurosci. 2022, 16, 765720. [Google Scholar] [CrossRef] [PubMed]
Chaminade, T.; Spatola, N. Perceived facial happiness during conversation correlates with insular and hypothalamus activity for humans, not robots: An investigation of the somatic marker hypothesis applied to social interactions. Front. Psychol. 2022, 13, 871676. [Google Scholar] [CrossRef]
May, J.; Workman, C.I.; Haas, J.; Han, H. The neuroscience of moral judgment: Empirical and philosophical developments. In Neuroscience and Philosophy; The MIT Press: Cambridge, MA, USA, 2022; pp. 17–47. [Google Scholar]
Yip, J.A.; Stein, D.H.; Côté, S.; Carney, D.R. Follow your gut? Emotional intelligence moderates the association between physiologically measured somatic markers and risk-taking. Emotion 2020, 20, 462. [Google Scholar] [CrossRef] [PubMed]
Ojha, S.C. Role of somatic markers in consumer durable brand selection in e-retail. Int. J. Bus. Forecast. Mark. Intell. 2020, 6, 1–16. [Google Scholar] [CrossRef]
Sandor, S.; Gürvit, H. Development of somatic markers guiding decision-making along adolescence. Int. J. Psychophysiol. 2019, 137, 82–91. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Li, J.; Liu, H.; Wang, Z.; Yang, L.; An, L. Influence Factors for Decision-Making Performance of Suicide Attempters and Suicide Ideators: The Roles of Somatic Markers and Explicit Knowledge. Front. Psychol. 2021, 12, 693879. [Google Scholar] [CrossRef]
Miyasaka, M.; Nomura, M. Effect of financial and non-financial reward and punishment for inhibitory control in boys with attention deficit hyperactivity disorder. Res. Dev. Disabil. 2023, 134, 104438. [Google Scholar] [CrossRef]
Zu, J.; Xu, F.; Jin, T.; Xiang, W. Reward and Punishment Mechanism with weighting enhances cooperation in evolutionary games. Phys. A Stat. Mech. Its Appl. 2022, 607, 128165. [Google Scholar] [CrossRef]
Corvalan, N.; Crivelli, L.; Allegri, R.F.; Pedreira, M.E.; Fernández, R.S. The impact of reward and punishment sensitivity on memory and executive performance in individuals with Amnestic Mild Cognitive Impairment. Behav. Brain Res. 2024, 471, 115099. [Google Scholar] [CrossRef]
Liu, J.; Wang, H.; Xing, S.; Liu, X. Sensitivity to reward and punishment in adolescents with repetitive non-suicidal self-injury: The role of inhibitory control. Int. J. Clin. Health Psychol. 2024, 24, 100456. [Google Scholar] [CrossRef] [PubMed]
Poppa, T.; Bechara, A. The somatic marker hypothesis: Revisiting the role of the ‘body-loop’in decision-making. Curr. Opin. Behav. Sci. 2018, 19, 61–66. [Google Scholar] [CrossRef]
Biernacki, K.; Terrett, G.; McLennan, S.N.; Labuschagne, I.; Morton, P.; Rendell, P.G. Decision-making, somatic markers and emotion processing in opiate users. Psychopharmacology 2018, 235, 223–232. [Google Scholar] [CrossRef]
Orsini, C.A.; Heshmati, S.C.; Garman, T.S.; Wall, S.C.; Bizon, J.L.; Setlow, B. Contributions of medial prefrontal cortex to decision making involving risk of punishment. Neuropharmacology 2018, 139, 205–216. [Google Scholar] [CrossRef] [PubMed]
Jean-Richard-Dit-Bressel, P.; Killcross, S.; McNally, G.P. Behavioral and neurobiological mechanisms of punishment: Implications for psychiatric disorders. Neuropsychopharmacology 2018, 43, 1639–1650. [Google Scholar] [CrossRef]
Parveen, S.; Satti, Z.W.; Subhan, Q.A.; Jamil, S. Exploring market overreaction, investors’ sentiments and investment decisions in an emerging stock market. Borsa Istanb. Rev. 2020, 20, 224–235. [Google Scholar] [CrossRef]
Rocciolo, F.; Gheno, A.; Brooks, C. Optimism, volatility and decision-making in stock markets. Int. Rev. Financ. Anal. 2019, 66, 101356. [Google Scholar] [CrossRef]
Zhang, S.Y. Are investors sensitive to climate-related transition and physical risks? Evidence from global stock markets. Res. Int. Bus. Financ. 2022, 62, 101710. [Google Scholar] [CrossRef]
Goodell, J.W.; Kumar, S.; Rao, P.; Verma, S. Emotions and stock market anomalies: A systematic review. J. Behav. Exp. Financ. 2023, 37, 100722. [Google Scholar] [CrossRef]
Leibbrandt, A.; López-Pérez, R.; Spiegelman, E. Reciprocal, but inequality averse as well? Mixed motives for punishment and reward. J. Econ. Behav. Organ. 2023, 210, 91–116. [Google Scholar] [CrossRef]
Reiter-Gavish, L.; Qadan, M.; Yagil, J. Investors’ personal characteristics and trading decisions under distressed market conditions. Borsa Istanb. Rev. 2022, 22, 240–247. [Google Scholar] [CrossRef]
Aljifri, R. Investor psychology in the stock market: An empirical study of the impact of overconfidence on firm valuation. Borsa Istanb. Rev. 2023, 23, 93–112. [Google Scholar] [CrossRef]
Tiwari, A.K.; Abakah, E.J.A.; Bonsu, C.O.; Karikari, N.K.; Hammoudeh, S. The effects of public sentiments and feelings on stock market behavior: Evidence from Australia. J. Econ. Behav. Organ. 2022, 193, 443–472. [Google Scholar] [CrossRef]
Cabrera-Paniagua, D.; Primo, T.T.; Cubillos, C. Distributed stock exchange scenario using artificial emotional knowledge. In Advances in Artificial Intelligence–IBERAMIA 2014, Proceedings of the 14th Ibero-American Conference on AI, Santiago de Chile, Chile, 24–27 November 2014; Proceedings 14; Springer: Berlin/Heidelberg, Germany, 2014; pp. 649–659. [Google Scholar]
Cabrera, D.; Araya, N.; Jaime, H.; Cubillos, C.; Vicari, R.; Urra, E. Defining an affective algorithm for purchasing decisions in E-commerce environments. IEEE Lat. Am. Trans. 2015, 13, 2335–2346. [Google Scholar] [CrossRef]
Cabrera, D.; Rubilar, R.; Cubillos, C. Resilience in the decision-making of an artificial autonomous system on the stock market. IEEE Access 2019, 7, 145246–145258. [Google Scholar] [CrossRef]
Taverner, J.; Vivancos, E.; Botti, V. A fuzzy appraisal model for affective agents adapted to cultural environments using the pleasure and arousal dimensions. Inf. Sci. 2021, 546, 74–86. [Google Scholar] [CrossRef]
Luo, B.; Zeng, J.; Duan, J. Emotion space model for classifying opinions in stock message board. Expert Syst. Appl. 2016, 44, 138–146. [Google Scholar] [CrossRef]
Epivent, A.; Lambin, X. On algorithmic collusion and reward–punishment schemes. Econ. Lett. 2024, 237, 111661. [Google Scholar] [CrossRef]
Deng, J.; Su, C.; Zhang, Z.M.; Wang, X.P.; Wang, C.P. Evolutionary game analysis of chemical enterprises’ emergency management investment decision under dynamic reward and punishment mechanism. J. Loss Prev. Process. Ind. 2024, 87, 105230. [Google Scholar] [CrossRef]
Cabrera-Paniagua, D.; Rubilar-Torrealba, R. A novel artificial autonomous system for supporting investment decisions using a Big Five model approach. Eng. Appl. Artif. Intell. 2021, 98, 104107. [Google Scholar] [CrossRef]
Schäl, M. Markov decision processes in finance and dynamic options. In Handbook of Markov Decision Processes: Methods and Applications; Springer: Berlin/Heidelberg, Germany, 2002; pp. 461–487. [Google Scholar]
Claramunt, M.M.; Mármol, M.; Varea, X. Facing a Risk: To Insure or Not to Insure—An Analysis with the Constant Relative Risk Aversion Utility Function. Mathematics 2023, 11, 1070. [Google Scholar] [CrossRef]
Zhang, P.; Yan, Z. Optimal portfolio with relative performance and CRRA risk preferences in a partially observable financial market. Appl. Math. Comput. 2024, 481, 128947. [Google Scholar] [CrossRef]
Sharpe, W.F. Mutual fund performance. J. Bus. 1966, 39, 119–138. [Google Scholar] [CrossRef]
Markowits, H. Portfolio Selection. J. Financ. 1952, 1, 71–91. [Google Scholar]

Figure 1. Graphical representation of Algorithm 1.

Figure 2. Diagram representation of the general process.

Figure 3. Accumulated wealth behavior.

Figure 4. Utility behavior.

Figure 5. Example of SIndex behavior.

Table 1. General parameters for the experimental scenario.

Parameters	Value	Equation
$σ$	2	(4)
$ζ$	1	(6)
$ρ_{R, t}$	rand(0, 0.002)	(6)
$ν$	1	(7)
$ρ_{L, t}$	rand(−0.0005, 0.0005)	(7)
$η$	1	(8)
$ρ_{C, t}$	rand(−0.0005, 0.0005)	(8)
$W 1$	0.33	(9)
$W 2$	0.33	(9)
$W 3$	0.34	(9)
$α$	0.005	(11)
$β$	0.9	(11)
C	0.002	(11)
$γ$	0.0032	(12)
$φ$	0.9	(12)
$δ$	1	(13)
$ω$	0.9	(13)

Table 2. Experimental results.

	Investment Metric	2017	2018	2019	2020	2021
Autonomous System	Wealth	11,166	13,554	18,664	22,176	22,962
	Utility	0.96	1.963	2.819	3.018	3.049
	Sharpe Ratio	0.036	0.101	0.149	0.082	0.026
SP500 Benchmark	Wealth	11,930	11,195	14,666	17,838	23,171
	Utility	1.37	0.957	2.277	2.746	3.065
	Sharpe Ratio	0.177	−0.011	0.135	0.049	0.126
Total Performance	$\frac{T o t U t_{A A, T}}{T o t U t_{B e n c h, T}}$	0.68	0.89	1.10	1.16	1.12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cabrera-Paniagua, D.; Rubilar-Torrealba, R.; Castro, N.; Taverner, J. Artificial Punishment Signals for Guiding the Decision-Making Process of an Autonomous System. Appl. Sci. 2024, 14, 7595. https://doi.org/10.3390/app14177595

AMA Style

Cabrera-Paniagua D, Rubilar-Torrealba R, Castro N, Taverner J. Artificial Punishment Signals for Guiding the Decision-Making Process of an Autonomous System. Applied Sciences. 2024; 14(17):7595. https://doi.org/10.3390/app14177595

Chicago/Turabian Style

Cabrera-Paniagua, Daniel, Rolando Rubilar-Torrealba, Nelson Castro, and Joaquín Taverner. 2024. "Artificial Punishment Signals for Guiding the Decision-Making Process of an Autonomous System" Applied Sciences 14, no. 17: 7595. https://doi.org/10.3390/app14177595

APA Style

Cabrera-Paniagua, D., Rubilar-Torrealba, R., Castro, N., & Taverner, J. (2024). Artificial Punishment Signals for Guiding the Decision-Making Process of an Autonomous System. Applied Sciences, 14(17), 7595. https://doi.org/10.3390/app14177595

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Punishment Signals for Guiding the Decision-Making Process of an Autonomous System

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Mathematical Formulations for the Autonomous System

3.2. Indexes for Risk, Loss, and Accumulated Loss

3.3. Somatic Index and Punishment Factor

3.4. Tolerance to Risk, Loss, and Accumulated Loss, and Performance Evaluation

4. Experimental Scenario

4.1. Scenario Description

4.2. Experimental Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI