How Do Humans Recognize the Motion Arousal of Non-Humanoid Robots?

Xie, Qisi; Chen, Zihao; Luh, Dingbang

doi:10.3390/app15041887

Open AccessArticle

How Do Humans Recognize the Motion Arousal of Non-Humanoid Robots?

by

Qisi Xie

,

Zihao Chen

and

Dingbang Luh

^*

School of Art and Design, Guangdong University of Technology, Canton 510090, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(4), 1887; https://doi.org/10.3390/app15041887

Submission received: 8 January 2025 / Revised: 8 February 2025 / Accepted: 9 February 2025 / Published: 12 February 2025

Download

Browse Figures

Versions Notes

Abstract

:

As non-humanoid robots develop and become more involved in human life, emotional communication between humans and robots will become more common. Non-verbal communication, especially through body movements, plays a significant role in human–robot interaction. To enable non-humanoid robots to express a richer range of emotions, it is crucial to understand how humans recognize the emotional movements of robots. This study focuses on the underlying mechanisms by which humans perceive the motion arousal levels of non-humanoid robots. It proposes a general hypothesis: Human recognition of a robot’s emotional movements is based on the perception of overall motion, and is independent of the robot’s mechanical appearance. Based on physical motion constraints, non-humanoid robots are divided into two categories: those guided by inverse kinematics (IK) constraints and those guided by forward kinematics (FK) constraints. Through literature analysis, it is suggested that motion amplitude has the potential to be a common influencing factor. Two psychological measurement experiments combined with the PAD scale were conducted to analyze the subjects’ perception of the arousal expression effects of different types of non-humanoid robots at various motion amplitudes. The results show that amplitude can be used for expressing arousal across different types of non-humanoid robots. Additionally, for non-humanoid robots guided by FK constraints, the end position also has a certain impact. This validates the overall hypothesis of the paper. The expression patterns of emotional arousal through motion amplitude are roughly the same across different robots: the degree of motion amplitude corresponds closely to the degree of arousal. This research helps expand the boundaries of knowledge, uncover user cognitive patterns, and enhance the efficiency of expressing arousal in non-humanoid robots.

Keywords:

human–robot interaction; non-humanoid robot; emotional expression; arousal expression

1. Introduction

With the advancement of technology [1,2], non-humanoid robots (NHRs) have gradually become involved in human life as the most universally used type of robots, participating in human work and daily routines across various fields. For instance: logistics transportation robots widely used in business [3], hotel robots [4], agricultural planting robots [5], construction robots [6], home cleaning robots for household environments [7], pet robots like Cozmo [8] for gaming and companionship, drones [9] for large-scale performances, educational assistant robots [10], and medical support robots [11] in the health field such as the biomimetic chicken robot Keepon [12] and the biomimetic seal robot PARO [13]. All these have played a role in various aspects of people’s lives. Based on the current situation, we can foresee that more production tools and life products will cooperate with humans in the form of NHRs, and will play a more extensive and profound role in human life [14]. Consequently, emotional communication between humans and machines will become more common and in-depth. However, NHRs differ greatly in appearance from humans. To enable NHRs to display a richer range of emotions in human–robot interaction, it is necessary to first understand how humans recognize the emotional expressions of NHRs in human–machine communication.

In expressing emotional characteristics, it is primarily achieved through emotional dimensions. The dimension of emotional arousal represents the intensity of emotions and can intuitively show the degree of a robot’s emotional activation to the user. High arousal is often associated with “extreme” emotional states [15], such as anger, enthusiasm, fear, and surprise, while low arousal is typically related to less “extreme” emotional states, such as happiness, sadness, and interest. The value of arousal lies in the following aspects:

Arousal can reflect the subtlety and continuity of emotional changes, thereby improving the efficiency and smoothness of human–robot communication and enhancing the effectiveness of emotional information transmission [16].
Arousal can enhance the expressiveness and appeal of robots in interaction, making communication more vivid and persuasive.
Arousal helps to elicit emotional resonance and emotional contagion; this emotional synchronization aids in building trust, making it more memorable and thus fostering a better relationship between both parties [17].
In emergency situations, emotions with high arousal, such as fear, can quickly mobilize surrounding resources and guide people to make decisions more swiftly [18].

Therefore, the expression of arousal plays a significant role in NHR interaction.

Since NHRs are primarily oriented towards movement functions and exhibit diverse movement patterns and forms, action expression becomes an important way of communication for non-humanoid robots in human–robot interaction. Therefore, the use of actions to express the arousal of emotions is a significant research topic for NHRs.

Previous research has clearly pointed out that the key to emotional expression lies in the mode of movement, not the morphology itself. NHRs are also able to convey emotional states that humans can understand through their actions [19]. Despite the significant differences in physiological structure between robots and humans, which prevent the complete replication of all details of human movement, humans can still perceive emotions from unfamiliar forms and movements. This point is deeply illuminated by Gestalt psychology.

According to the research on Gestalt theory in cognitive psychology [20], human perception of objective objects stems from the overall relationship rather than specific details. As psychologist Rudolf Arnheim explained in “Art and Visual Perception” [21], even non-humanoid objects can express emotions through movement because human visual perception has a tendency to organize stimuli into the simplest possible order. We tend to believe that order exists in things themselves, even when we know little about their nature. When our senses present the arrangement of things, we can easily imagine them. In other words, the human visual system ignores details to reduce cognitive load, so the judgment of emotional expression in movement is based on the perception of overall patterns.

This point is also proven in the field of animation. For example, animation software summarizes movement patterns through limited operation methods, giving the illusion of life to various creatures and objects, allowing them to display emotions that humans can recognize. PIXAR’s small desk lamp character is a classic case, conveying emotional content through actions without a humanoid appearance.

In the context of the meta-verse, where reality and virtual space merge, robots have become a new form of animation. With the assistance of AI, they jump out of the screen and participate in human life. Therefore, this study aims to explore how humans perceive the movement and corresponding emotions of NHRs based on Gestalt psychology and to make assumptions about the expression patterns of NHR. This is intended to deepen our understanding of robot emotional expression and provide theoretical guidance for robot design.

Thus, this study makes an overall hypothesis: Human recognition of a robot’s emotional actions also stems from the perception of the overall movement, which is unrelated to the mechanical appearance. Furthermore, it considers the question of how humans recognize the arousal expressed through the movement of NHRs: Is there a movement element that can transcend appearance differences and express the emotion arousal of NHRs based on humans’ underlying cognition?

The innovations of this study are as follows:

(1): Perspective Innovation: Analyzing the emotional expression of NHRs from the perspective of human cognition breaks the limitations of traditional methods that rely solely on data-driven approaches. It provides theoretical support and practical guidance for the emotional interaction of NHR in complex environments.
(2): Method Innovation: The proposal of using universal emotional expression elements that are not dependent on specific robot forms or designs can be migrated and applied across different types of NHR. This greatly enhances the universality and practicality of the research results. These elements are also highly replicable and iterative, providing a new basis for the application of machine learning algorithms on NHR. This helps accelerate knowledge accumulation and technological progress, laying the foundation for promoting multimodal and interdisciplinary research.
(3): Application Innovation: By integrating interdisciplinary methods, theories from psychology and kinesiology are applied to the new field of NHR emotional expression research. This optimizes the human–robot interaction process, ensuring that even users without research experience in robotics can correctly understand the emotional expression of robots. As a result, the naturalness and accuracy of the emotional expression of NHR are improved.

In the literature review section, this paper first introduces the concept of emotional dimensions and the expression of movement. It then analyzes the elements related to the expression of arousal and extracts elements that may have universal value as the research subjects. Following this, based on the theory of physical kinematics constraints, the paper classifies NHRs and identifies key research points that have been overlooked in previous studies.

In the theory and hypothesis section, relevant hypotheses are proposed based on the literature analysis and they are tested in experiments. Then, a comprehensive discussion is conducted. Finally, a summary is provided to conclude the paper.

2. Related Work

2.1. Emotion and Emotional Models

Plutchik believed that emotions are a loosely connected complex chain of events triggered by stimuli, encompassing sensations, psychological changes, impulsive actions, and specific tendencies in behavior [22]. Scherer, on the other hand, viewed emotions as the synchronous and interconnected changes in all or most of the contents of five organic subsystems, including cognitive appraisal, physiological regulation, action tendencies, subjective feelings, and motor expressions, in response to external or internal stimulus events related to the organism’s primary concerns [23]. In summary, emotions are the response and evaluation of stimuli, implicitly reflecting the human process of problem-solving, decision-making, thinking, and perception.

Ekman [24] proposed six discrete basic emotions, including anger, disgust, fear, happiness, sadness, and surprise. Plutchik [25] identified eight basic emotions—anger, disgust, fear, happiness, sadness, surprise, trust, and anticipation—which trigger responses to significant events related to human goals and needs. The classification of emotions can also be based on dimensions. Russell [26] proposed a continuous space of emotions, a circular model in a two-dimensional space divided by pleasure and arousal, known as the circumplex model of emotion. Arousal is represented on the vertical axis, indicating the degree of emotional activation observed, while pleasure is on the horizontal axis, representing the pleasantness and positivity of the emotion. Building on this, the PAD (Pleasure-Arousal-Dominance) space introduced by Mehrabian and Russell [27] became a popular three-dimensional continuous space for representing emotions. The PAD space adds a third dimension to the two axes of the circumplex model of emotion—dominance, which refers to the control and influence over others’ emotions and the external environment. For example, excitement has more dominance than happiness and also has a greater impact on the external world.

This study primarily focuses on the motor expression of arousal. The position of emotions in the dimensional space is determined, and the comparative relationship between the levels of arousal among different emotions is also certain. Even across different emotional models, the positional ordering of emotional content is basically consistent. Therefore, it is possible that there exists a universal element that manifests different levels of arousal based on varying degrees.

2.2. Advantages of Action Emotion Expression

Expressing emotions through actions in NHRs offers advantages in terms of implementation methods, expression efficiency, and interactive effects. Firstly, in terms of implementation, most function-oriented non-humanoid robots lack components such as facial expressions and voices [8], making action expression through existing motion components a relatively economical and efficient way to anthropomorphize emotional expression. Secondly, in terms of expression efficiency, action expression is less affected by complex and variable interactive environments, such as environmental noise, language barriers, screen size limitations, long interaction distances, obstructions by objects in the environment, reflections, and backlighting [28], and is also easier for users to understand [29]. Thirdly, in terms of interactive effects, actions are infectious and interactive [30] and can quickly engage and evoke users’ emotions during the interaction process [31], enhancing user engagement and experience quality. It is evident that research on expressing emotions through actions in non-humanoid robots can be applied to different types of NHRs and has a wider range of application scenarios, better expression effects, and greater potential for development. Therefore, discussing how the action expression of NHRs is understood by humans is of great significance for research on robot emotional expression in human–computer interaction.

2.3. Existing Problems in the Expression of Emotional Movements of Robots

The mainstream methods for robots to express emotions can be divided into two categories: one is to use machine learning algorithms, and the other is to find corresponding expression elements based on the robot’s own characteristics according to emotional dimensions.

The principle of using machine learning algorithms is to learn relevant features from a large corpus of labeled sample data, allowing the machine to automatically extract and map these features to generate expression trajectories. Common machine learning methods include spatiotemporal models such as Factored Conditional Restricted Boltzmann Machines (FCRBMs) [32], neural networks [33], and Factored Gaussian Process Dynamical Models (GPDM) [34]. Additionally, the use of large language models focuses on helping machines understand human–robot interaction scenarios [35], infer user preferences [36], and analyze responses to the environment and users [37]. Nevertheless, the generation of expressive behaviors still relies on datasets labeled with expressive behaviors. Therefore, the content learned by machine learning is generally relatively simple, and most works still focus on single tasks (such as walking [38]) and a specific structure. The quality of emotional actions for generating complex behaviors is not high, and there is a gap between human expectations [35]. Due to the need for biomimetic objects as a database, this method has limited applications on NHRs that have a significant appearance difference from humans and animals.

Research on elements based on emotional dimensions often starts with selecting features from the robot’s own characteristics. Factors related to arousal include speed [39] and acceleration [40], with higher speeds and accelerations representing higher levels of arousal. However, these elements are related to the characteristics of the machine itself and may not be applicable to different types of robots: a large machine cannot move as quickly as a small robot, and a robot that typically moves slowly cannot reduce arousal by moving even slower. Repetition [41] and emphasis [42] can increase arousal to some extent. However, the content of repetition and emphasis varies due to differences among robots. Without basic motion elements, these two cannot play a role.

Additionally, in research on expressing arousal through movement, there are many cases that focus on robotic arms. For example, Xu et al. [39] found that the amplitude of hand movements in the robot Nao is somewhat related to arousal. Claret et al. [43] demonstrated arousal and emotional states by varying the amplitude of the Pepper robot’s waving motion. Hagane and Venture [44] used the concept of geometric entropy to calculate parameters of the robotic arm’s spatial movement range and mapped them to the PAD emotional space.

Furthermore, on highly simulated robotic arms, finger straightness [39] can also express arousal by showing the fingers: high arousal is displayed through more straight fingers, while low arousal is expressed through bent fingers.

Overall, these elements are often fragmented, and their selection depends on the specific characteristics and details of the individual robot. Since they lack standardization and support, and do not explore the rules and mechanisms of expression from the perspective of human cognition, these elements cannot be transferred and used on other types of Non-Humanoid Robots (NHR). On one hand, it is difficult to ensure that the use of these elements can be correctly understood by ordinary people without experience in robot research and effectively promote human–robot communication. On the other hand, the methods used are not replicable and iterative, which is disadvantageous for knowledge accumulation and progress. They fail to provide a common research foundation for multimodal and interdisciplinary studies.

3. Theory and Hypothesis

Based on the Gestalt theory’s proposed cognitive laws of human perception of movement as a whole, among the elements that express arousal, amplitude is a relatively universal element. It can obviously demonstrate the arousal state of machinery, and we assume it to be a universal element for expressing arousal on different types of NHRs. Currently, the impact of amplitude on emotions is mainly reflected in research on robotic arms. This study explores whether amplitude can indeed serve as a universal element to express the arousal of movement on other types of NHRs. It aims to investigate whether there are laws governing how humans perceive the expression of arousal in NHR.

Following the explanation of “amplitude” from literature related to the emotional expression of robotic arms [39,43], in this study, “amplitude” is defined as the maximum displacement between two points. This definition differs from the concept of “amplitude” used to describe periodic motion.

3.1. Classification and Research Gaps of NHRs

Based on the theory of physical motion constraints [45], robots can be divided into two types; see Figure 1. This classification is often used in 3D animation: by guiding actions through only two constraints, the movements of all living beings can be restored. At the same time, NHRs that utilize the same constraints exhibit similar motion characteristics and commonalities.

Type one is primarily guided by IK (Inverse Kinematics) constraints to display movement. The application of inverse kinematics is mainly used in robotic arms or limb endpoints. The movement of this type of NHR is object-oriented, with the components at the mechanical endpoints taking the lead in movement. For instance, human arms are primarily led by the hands at their ends, moving towards objects they intend to touch as targets. Robotic arms operate similarly. For such NHRs, their movement structure always connects with the supporting structure, like a base, but their movement does not affect the supporting parts. This is analogous to the way the human body remains still and unaffected while the arm moves. Therefore, robotic arms and similar non-humanoid robots are relatively free and flexible in movement, with the ability to start and stop randomly.

Type two is guided by FK (forward kinematics) constraints and mainly applied to the main structural parts of robots. The movement of this type of robot is interconnected with, and influences, the surrounding structures. For instance, the human body’s movement adheres to the principles of forward kinematics, where the movement of the body necessarily impacts the movement of the limbs and head. In this type of NHR, when overall movement is required, the main structure of the robot acts as the decision-maker. Taking a spider-like robot as an example, for it to move as a whole, all its legs must follow the movement decisions made by the body, which serves as the central decision-maker.

Correspondingly, when this type of machinery needs to move as a whole, special situations arise due to its structure. For example, the robot must consider the weight and other influences of its connected structures. It cannot move as freely and flexibly as a robotic arm guided by IK constraints. Generally, a movement is composed of a starting point and an endpoint. However, for NHRs dominated by FK constraints, a special case can occur: a single movement may require a series of continuous actions with coexisting opposite directions. For instance, in a holistic jumping movement, a spider-like robot does not have components that can support it to stop at any height. After jumping, it is subject to the influence of weight and gravity and needs to fall to the ground for support.

At this time, their movement amplitude is not simply composed of a “starting point” and an “end point”, but consists of two sections of movement: the “jump segment” and the “falling segment”, which include three nodes: “starting point–midpoint–end point”. There has been no discussion about human perception of this type of movement. In this case, is the concept of amplitude still applicable in terms of its impact on arousal?

Therefore, the first key point explored in this study is the following: For NHRs guided by FK constraints, when the robot’s action expression is influenced by factors such as weight and gravity, can amplitude also be considered as an element expressing arousal?

The second key point primarily investigates whether NHRs with both types of constraints guiding their movement can also express arousal through amplitude. NHRs that are simultaneously dominated by both IK and FK constraints have a main structure that controls the overall movement and can also perform independent movements of their limbs, representing a complex type that combines both characteristics.

3.2. Hypothesis

Based on previous research related to the emotional expression of robotic arms, it has been found that the larger the amplitude, the greater the level of arousal [39,43,44]. At the same time, according to the overall hypothesis of this paper: People’s recognition of mechanical emotional actions also comes from the perception of the overall movement. This study posits that NHRs guided by inverse kinematics (IK) constraints in their movement (using spider robots as an example) express arousal levels in the same way as robotic arms. That is, regardless of how they move, the amplitude of their movement is directly proportional to the level of arousal expressed. Based on the first key point, Hypothesis 1 is proposed:

H1.

In FK-guided NHR, whether in the jump-phase-dominant group or the fall-phase-dominant group, there are significant differences in the arousal levels recognized by participants for the large, medium, and small amplitudes displayed by NHR when compared within the group.

Furthermore, if amplitude is the main influencing factor, then the height of the “middle point” is the most important content. It is inferred that the “takeoff segment” and “falling segment” have no significant impact on the recognition results. Therefore, Hypothesis 2 is proposed:

H2.

In FK-guided NHR, when comparing between the jump-phase-dominant group and the fall-phase-dominant group at the same levels of large, medium, and small amplitudes, there is no significant difference in the recognition of arousal levels.

The emotional arousal levels resulting from different degrees of movement amplitude is the content that this study aims to understand. Based on this, Hypothesis 3 is proposed:

H3.

In FK-guided NHR, in both the jump-phase-dominant group and the fall-phase-dominant group, the movements of large, medium, and small amplitudes can be recognized by participants as corresponding to high, medium, and low arousal levels.

Additionally, to further explore the universality of amplitude usage in NHRs that possess both structural types, Experiment 2 replicated the study using another NHR. Based on the fundamental hypothesis of this study that there are inherent patterns in people’s recognition of emotional expressions in robots, the following hypothesis is proposed:

H4.

In NHR guided by both IK and FK, there are significant differences in the arousal levels recognized by participants for the large, medium, and small amplitudes, and these recognition results also increase as the amplitude of NHR’s movement increases from small to large.

To further refine the relationship between “amplitude” and “emotional arousal”, the study conducted hypothetical modeling, as follows:

Definition:

(1): The feasible domain of robot movement (the maximum amplitude of the robot) $M$ , with its range of values being $[M_{\min}, M_{\max}]$ . In emotional motion expression, the description of the motion range is an independent variable unrelated to the size of the robot.
(2): $\partial$ is a proportional coefficient calculated by dividing the position of the expression structure at any time t by the maximum amplitude of motion, so its range of values is [0, 1]. In other words, $\partial$ is a variable related to time and relative motion position, which can be used to represent the vertical displacement at any time. In $\partial M_{\max}$ , $M_{\max}$ can be regarded as a constant, with only one variable related to time t. Therefore, the displacement at time $t$ can be expressed as $\partial M_{\max}$ .
(3): The instantaneous velocity $V_{s}$ of the robot’s moving component at time t has the following relationship with the displacement $S$ of the component throughout the movement:

$S (t) = \int_{0}^{t} v_{s} (t) d t$

(1)

That is, when the end of the movement is at time

t_{e}

, then:

\int_{0}^{t_{e}} v_{s} (t) d t = S_{\max}

(2)

In this study, different experimental groups all move at the same speed, so it is considered that the velocity

V_{s}

of the robot’s moving component is a constant, therefore:

V_{s} = \int_{0}^{t} v_{s} (t) d t

(3)

Since this study primarily focuses on the relationship between the amplitude of vertical displacement and emotional expression, the changes in instantaneous velocity do not constitute the dominant factor affecting emotional recognition.

The above constitutes the input section of the model.

Output Definition: Emotional Arousal is denoted as

A

. This study refers to the values of the PAD (Pleasure-Arousal-Dominance) scale, selecting from four groups of descriptive phrases related to arousal using a 9-point rating scale, and finally taking the average of these ratings. Therefore, based on the results of the PAD scale, the range of emotional arousal values in this study is denoted as

A \in [A_{l}, A_{h}]

, with

A_{l} = - 2

,

A_{h} = 2

in this example.

A = Ω_{1} (\partial, m, t)

(4)

Among them,

m

represents the variable of motion amplitude, and

t

denotes the time.

Since this study assumes that the component moves at a constant speed, the model is simplified as follows:

A = Ω_{2} (\partial, m)

(5)

In this study, emotions are categorized into three types based on arousal level: high arousal, medium arousal, and low arousal. Therefore, the feasible domain of M is divided into three categories, with the boundary points of displacement proportion being

β_{1}, β_{2}, β_{3}

, where

β_{1}, β_{2}, β_{3} \in [0, \partial]

.

Therefore, the expression models corresponding to the three types of emotions are as follows:

\begin{array}{l} E_{A 1} = f (A_{1}) = Ω_{2} (β_{1}, m) \\ E_{A 2} = f (A_{2}) = Ω_{2} (β_{2}, m) \\ E_{A 3} = f (A_{3}) = Ω_{2} (β_{3}, m) \end{array}

(6)

In this study, we set the values as follows:

\begin{array}{l} β_{1} = 1 / 3 \partial \\ β_{2} = 2 / 3 \partial \\ β_{3} = \partial \end{array}

(7)

The calculation method for

f (A)

is based on the PAD (Pleasure-Arousal-Dominance) scale, with specific details found in the experimental scale of the following experiment.

4. Experiment One

The first experiment was conducted based on the assumptions of H₁, H₂, and H₃. To ensure that the experimental results were less affected by the environment and hardware and to achieve higher accuracy in repeated experiments, the experiment was conducted using a combination of simulated videos and psychological measurements. The experiment focused on a spider robot as a representative case study. The simulated video materials included a “jump-phase-dominant group” with a longer jump distance and a “fall-phase-dominant group” with a longer fall distance. Each group consisted of three levels of amplitude, large, medium, and small, to facilitate comparisons within and between groups. To analyze the correlation between amplitude and arousal levels, as well as to determine the final identified emotional types, a psychological measurement experiment using the PAD scale was conducted. This helped to validate whether the hypotheses hold true.

4.1. Simulation Materials

This study chose Blender 3.1 as the primary production platform based on its excellent data compatibility, which allows seamless integration with other software platforms and tight integration into practical engineering applications. Additionally, Blender’s rendering advantages are significant, capable of realistically reproducing the dynamic movements of the simulation environment and the robot’s body, greatly enhancing the realism and fidelity of the experiment.

During the simulation modeling process, the software permits us to record all three-dimensional models and simulation parameters in a parametric form, including key indicators such as the robot’s movement speed and posture changes, ensuring that other researchers can accurately replicate the simulation conditions. We have also meticulously documented the version information of the Blender software and all related plugins to prevent compatibility issues arising from software updates, thus ensuring the replicability of the experiment and the consistency of the results.

The use of this simulation software ensures that the experimental results are minimally affected by the environment and hardware even after multiple repetitions of the experiment. Thus, before the formal experiment, we conducted a pre-experimental test to verify the stability of the simulation environment and the reliability of the simulation results. By collecting feedback from participants on the simulation experiment, we assessed the impact of the simulation environment on participants’ cognition and their acceptance of the simulated robot, providing important references for further research.

To minimize the impact of other interferences, the experiment uses the same spider robot model, moving at the same speed, with the same range of motion, the same color, and the same camera angle. The only difference is in the direction of movement. The basic settings are as follows: The lowest height the spider robot can reach is 13 cm, and the highest height it can reach is 69 cm. This means the range of motion amplitude is 56 cm. The video materials are divided into two groups (see the supplementary materials), as detailed below:

(1): Jump-Phase-Dominant Group

The “jump-phase-dominant group” has a longer distance during the jump phase, thereby capturing more attention from the participants. Video screenshots are shown specifically in Figure 2. The end positions for the three amplitude groups are the same.

Video 1: Shows a large-amplitude upward movement. The spider robot jumps from the lowest point (13 cm) to the highest reachable amplitude at 69 cm, then falls and stops at one-third of the height (31.5 cm) due to the inability to maintain a suspended state.

Video 2: Shows a medium-amplitude upward movement. The spider robot jumps from the lowest point (13 cm) to two-thirds of the height (50 cm), then falls and stops at one-third of the height (31.5 cm) due to the inability to maintain a suspended state.

Video 3: Shows a small-amplitude upward movement. The spider robot moves from the lowest point (13 cm) to one-third of the height (31.5 cm), then stops moving.

(2): Fall-Phase-Dominant Group

The “fall-phase-dominant group” has a longer distance during the fall phase, thereby capturing more attention from the participants. The movements in this group are also divided into three groups (1, 2/3, and 1/3 of the overall downward movement amplitude). Video screenshots are shown specifically in Figure 3.

Video 4: Shows the large amplitude range for downward movement. The spider robot first moves upward from one-third of the height (31.5 cm) to the maximum movement height (69 cm), then falls to the lowest height (13 cm) and stops.

Video 5: Shows the medium amplitude range for downward movement. The spider robot first jumps upward from one-third of the height (31.5 cm) to two-thirds of the height (50 cm), then falls to the lowest height (13 cm) and stops.

Video 6: Shows the small amplitude range for downward movement. The spider robot moves downward from one-third of the height (31.5 cm) to the lowest point (13 cm) and stops.

4.2. Experimental Scale

The scale used in this study is the Chinese version of the PAD (Pleasure-Arousal-Dominance) Emotional Scale revised by the Institute of Psychology, Chinese Academy of Sciences, based on the Chinese context [46]. The PAD scale has extensive applications in fields such as psychology, market research, and user experience design, and it can provide strong support for understanding the underlying cognitive patterns of human emotions towards Non-Humanoid Robots (NHR) in this study. Here, the focus is primarily on the degree of emotional activation and the intensity of emotions (i.e., arousal). Using the PAD scale can reveal the consistency and differences between the arousal levels of different emotional states, which helps in comparing emotional responses across various situations. At the same time, since this study employs a PAD scale revised for the Chinese context, it takes into account the characteristics of emotional expression and experience within the Chinese cultural background, enhancing the scale’s applicability and accuracy. This scale converts subjective emotional experiences into quantifiable data through standardized scoring methods, facilitating statistical analysis.

This scale has shown good reliability and construct validity. The simplified Chinese version of the PAD scale uses a 9-point semantic differential scale, with ratings ranging from −4 to 4. The scale uses 4 sets of descriptive words for the three emotional dimensions, Pleasure (P), Arousal (A), and Dominance (D), totaling 12 sets of descriptive words. Each set of descriptive words consists of a pair of opposite adjectives for the dimension it represents, while being basically the same for other dimensions. A higher absolute rating, such as choosing 4 or −4, indicates that the subject’s chosen emotion is closer to the adjective at that end. The closer the score is to the middle, the lower the degree of similarity between the subject’s emotional experience and the adjective. The middle score is 0, which represents that the adjectives at both ends of the group equally or weakly describe the subject’s emotion. This study mainly focuses on the expression patterns of emotions through actions. Therefore, only the sets of emotional words related to the arousal dimension of the PAD scale are used, totaling 4 sets. The descriptive words for the dimensions are interspersed and arranged with reverse choices for specific question numbers. The specific dimension descriptive word groups are shown in Table 1:

Among them, Q1, Q2, Q3, and Q4 are the vocabulary for the arousal dimension. Additionally, Q2 and Q4 are positive options, while Q1 and Q3 are reverse options. The final arousal score is obtained by calculating the mean. See Table 2 for details.

In the scoring calculation formula, positive options are added together and reverse options are subtracted. After summing up, the result is divided by four. The mean calculation result serves as the final arousal score.

4.3. Experimental Participants

All participants were recruited from university campuses, encompassing students with various academic backgrounds. To ensure the purity of the experimental results, we specifically selected students without prior experience working with robots to minimize the potential influence of previous experience on the outcomes. The participants voluntarily joined the study, totaling 30 individuals. Their age range was from 22 to 35 years, with an average age of 25.63 years and a standard deviation of 4.4 years. The gender distribution was balanced, with 16 male and 14 female participants. There were 9 students with backgrounds in liberal arts and arts and 21 students with backgrounds in science and engineering, reflecting disciplinary diversity. All participants had normal vision (including uncorrected or corrected vision) to ensure they could successfully complete the experimental tasks. Additionally, participants had no other physiological or psychological issues that might affect their performance in the experiment. Each participant completed the experiment independently to avoid interference with each other. As a token of appreciation for their participation, each participant received a small gift upon completing the experiment.

4.4. Experimental Procedure

Before the experiment, each participant filled out demographic information. Then, they were informed about the task content. The main focus was on explaining the meaning of the PAD scale used in the experiment to the participants: through examples, the meanings of the descriptive words in the PAD scale were explained, as well as the meanings represented by the 9-point scale ranging from −4 to 4. This helped the participants understand how to use the scale and the related evaluation criteria. Once the participants were ready, the experiment began. After playing the video, the participants filled out the scale. Each video was played three times, and after completing one set of scales, the next video was played. To reduce the influence of the simulation videos on each other, the upward and downward movement simulation videos were played in an alternating order.

The questionnaire was implemented using an online web editor and was conducted on a computer device connected to the internet. The display screen was 13 inches with a resolution of 1920 × 1080. The experimental environment was kept quiet, and the lighting brightness and illuminance met the lighting standards. The experimental data was analyzed using SPSS Statistics 25.

4.5. Experimental Results

The analysis of the experimental results focuses on the hypotheses proposed in this chapter, as follows:

H1.

In FK-guided NHR, whether in the jump-phase-dominant group or the fall-phase-dominant group, there are significant differences in the arousal levels recognized by participants for the large, medium, and small amplitudes displayed by NHR when compared within the group.

The effects of different amplitudes within each group on the recognition of arousal levels were compared using a one-way ANOVA analysis, see Figure 4.

From an intragroup comparison perspective, in the jump-phase-dominant group, there was no significant difference in the mean values between the three amplitudes (the F-statistic = 2.548, the significance p-value = 0.084). In the fall-phase-dominant group, there were significant differences among all three amplitudes (F = 37.36, p < 0.001). Post hoc multiple comparison analysis with Bonferroni correction showed that, in the fall-phase-dominant group, the mean comparisons between large and medium amplitude, medium and small amplitude, and large and small amplitude all had p-values less than 0.001.

Therefore, from the experimental results, it can be inferred that Hypothesis H₁ is not fully established. In this experiment, the jump-phase-dominant group does not conform to the hypothesis, while the fall-phase-dominant group does.

H2.

In FK-guided NHR, when comparing between the jump-phase-dominant group and the fall-phase-dominant group at the same levels of large, medium, and small amplitudes, there is no significant difference in the recognition of arousal levels.

A chi-square test was used to analyze whether there was a significant difference between the jump-phase-dominant group and the fall-phase-dominant group at different amplitude levels. The results showed that there was a significant difference between the two groups (X² = 22.49, p < 0.001). Independent sample T-tests were used to analyze the pairwise comparisons of arousal levels at each amplitude level, as shown in Figure 5.

Therefore, from the experimental results, it can be inferred that Hypothesis H₂ is not fully established. In this experiment, at the large amplitude level, the hypothesis holds true, but at the medium and small amplitude levels, there are significant differences in the recognition of arousal levels between jump phase and fall-phase-dominant group.

H3.

In FK-guided NHR, in both the jump-phase-dominant group and the fall-phase-dominant group, the movements of large, medium, and small amplitudes can be recognized by participants as corresponding to high, medium, and low arousal levels.

To visually compare the results of arousal levels expressed by different ranges of motion, this study compares the results using the PAD Emotional Scale revised by the Institute of Psychology, Chinese Academy of Sciences, tailored to the Chinese context. The values for basic emotional arousal in this scale are referenced in Table 3.

Since the positions of different emotions in the spatial dimensions are fixed, this study can visually inspect the impact of the range of motion on the level of arousal by using the emotional spatial locations. Placing the emotional results in the dimensional space, the positions of each emotional result are shown in the following Figure 6.

From the figure, it can be seen that the arousal levels of the emotional content corresponding to the three different amplitudes of movement increase with the increase in amplitude, which is consistent with past research findings. However, the arousal levels of the upward movement group showed relatively small changes in the three amplitudes.

Further correlation analysis was conducted, and the results are presented in Table 4.

In the jump-phase-dominant group, the correlation between amplitude and arousal is not significant (r = 0.187, p = 0.078). In the fall-phase-dominant group, there is a significant correlation between amplitude and arousal (r = 0.680, p < 0.001).

Overall, Hypothesis H₃ is not established in the jump-phase-dominant group but is supported in the fall-phase-dominant group.

4.6. Discussion

Different from previous studies that used machine learning algorithms to generate robot expressions and selected expression elements based on the robot’s own detailed features, this research analyzes whether the amplitude of robot movement can be a general element that affects human recognition of NHR’s arousal from the perspective of human cognition, based on Gestalt theory.

By utilizing the kinematics constraint theory, widely applied in three-dimensional animation, existing NHRs are categorized into those dominated by IK constraint and those dominated by FK constraint. Previous studies have indicated that amplitude is helpful in recognizing arousal in NHRs dominated by IK constraint. This experiment validates that amplitude is also applicable to NHRs dominated by FK constraint when a motion consists of two segments with opposite directions. In the stages where amplitude plays a role, the recognition of arousal in NHRs by humans increases as the amplitude increases.

The experimental results show that there is no significant difference in the recognition of arousal among the “jump-phase-dominant group”, indicating that the “jump phase” is not the main motion phase affecting the recognition of amplitude; the results of the “fall phase” show that the recognition of arousal is in the order of large amplitude > medium amplitude > small amplitude, which suggests that the “fall phase” plays a certain role in the participants’ emotional recognition process. Thereby, we can draw conclusion 1: in non-humanoid robots with “jumping” and “falling” movements, the amplitude size of the “fall phase” is the main motion interval that affects the recognition of arousal. In further analysis of the jump-phase-dominant group, there were no significant differences in arousal levels within the group. It is known that the “jump phase” does not play a role, and all within the group have the same “endpoint” with a relatively high position (at the height of 1/3 of the total amplitude). This indicates that the “endpoint” plays a major role in the recognition of arousal levels, and increasing the position of the “endpoint” helps participants recognize a higher level of arousal. Inter-group comparisons also confirm this point: when falling from the same height, a higher endpoint position results in a higher recognition of arousal levels. Based on the above analysis, Conclusion 2 can be drawn: when the endpoint position is higher, the recognition of arousal levels is increased.

The “Peak-End Rule” [48] suggests that people’s impression of an event is primarily formed by the peak and the end stages of the event, with these two stages having the most profound impact on the overall experience. The results of this experiment, to some extent, resonate with this rule. That is, within a combined motion, the highest point reached by the motion’s amplitude and the endpoint where the motion concludes play a significant role in the process of recognizing the arousal level of NHRs.

To apply these findings to other robots, three points can be considered:

Firstly, in other NHRs guided by FK, if there is a similar two-stage motion pattern, special attention should be paid to the amplitude and end position of the subsequent motion stage. Since these motion characteristics exhibit universality among NHRs of the same type, it is reasonable to speculate that under similar motion patterns, the influence of motion amplitude and end position on arousal recognition will also show a similar trend.

Secondly, in situations where motion space or power resources are limited, such NHRs can strategically elevate the end position of the motion to increase arousal. This method can provide an effective tool for robot designers to optimize the emotional expression of robots in resource-constrained environments.

Thirdly, our research also indicates that during key motion stages that affect human understanding of emotions, motion amplitude is indeed an important factor, and the recognition of arousal increases with the amplitude. Therefore, for NHRs similarly guided by FK constraints, adjusting the motion amplitude during key motion stages that influence emotional recognition can effectively convey the level of arousal.

5. Experiment Two

The analysis aimed to determine whether the results of arousal data under different ranges of motion conditions in this experiment are consistent with previous research patterns. Based on the definition of amplitude in this study as “the maximum displacement between two points,” it is difficult to judge the overall amplitude of the motion of complexly shaped NHRs from a single angle. Therefore, to facilitate operation and understanding, we categorize the motion as either dominated by the horizontal direction or the vertical direction based on the overall spatial range. The amplitude of different movements within the same group of robots is compared through the horizontal or vertical range of motion. In this experiment, vertical displacement was used to measure the range of amplitude.

The related hypothesis is as follows:

H4.

In NHR guided by both IK and FK, there are significant differences in the arousal levels recognized by participants for the large, medium, and small amplitudes, and these recognition results also increase as the amplitude of NHR’s movement increases from small to large.

5.1. Experimental Methods

In this experiment, the same PAD scale used in the previous section was employed for analysis. Since the direction of movement is not a primary factor affecting the recognition of arousal levels [39,42], we have chosen to focus on only one direction of movement in this study. The simulation materials were created using the open-source 3D animation software Blender, and a NHR model with two types of constraint guidance structures was used (see the supplementary materials). The basic settings of the model are as follows: the maximum reachable height is 47.2 cm, and the minimum height is 18.8 cm. This means the range of motion amplitude is 28.4 cm.

The movement is divided into three amplitudes, all using the same speed. The overall height reached by the small amplitude group is 28.2 cm, accounting for 1/3 of the total amplitude; the overall height reached by the medium amplitude group is 37.5 cm, accounting for 2/3 of the total amplitude; the overall height reached by the large amplitude group is 47.2 cm, representing 100% of the total amplitude. To reduce visual interference caused by color, only neutral gray is used in the simulation video. See Figure 7 for details.

The amplitude of the IK-dominant structure and the FK-dominant structure moving upward is also divided into three equal parts based on the movement range of the structure itself, and it increases in sequence in the three groups of amplitude movements.

5.2. Experimental Participants

The participants were also university students, and none of them had prior experience with robots to avoid potential experience-related biases in the experimental results. Thirty students voluntarily participated in the experiment, with a gender distribution of 21 males and 9 females. Their age ranged from 19 to 32 years, with an average age of 23.6 years. In terms of academic backgrounds, there were 19 students from science and engineering disciplines and 11 students from liberal arts. All participants had normal vision (including uncorrected or corrected vision), ensuring they could complete the experimental tasks without obstacles. The participants did not report any physiological or psychological issues that might affect the experimental results. Each participant completed the experiment independently to avoid external interference. As a token of appreciation for their participation, each student who completed the experiment received a small gift.

5.3. Experimental Procedure

The participants watched the videos and made scale selections. First, the participants filled out demographic information, and then they were introduced to the relevant information for filling out the scale. This mainly involved explaining the scale’s descriptive words and the meaning of the scale’s −4 to 4 point scale to help the participants understand the method of using the scale and the assessment criteria. Once the participants were ready, the experiment officially began. Each video was played three times before the corresponding scale selection appeared, and once the participant had filled it out, the next video continued to play. The average time taken was 12 min. At the end of the experiment, the participants received a small gift.

The questionnaire was implemented using an online web editor and was conducted on a computer device connected to the internet. The display screen was 13 inches with a resolution of 1920 × 1080. The experimental environment was kept quiet, and the lighting brightness and illuminance met the lighting standards. The experimental data were analyzed using SPSS Statistics 25.

5.4. Experimental Results

The results of the three amplitude levels of movement were analyzed using a one-way ANOVA, and the pairwise comparison results came from post hoc LSD multiple comparison analysis; see Figure 8.

From the results, it can be observed that the recognition results of the three groups of arousal levels decrease sequentially from the largest to the smallest amplitude. The mean of the large amplitude is 0.6, the mean of the medium amplitude is −0.18, and the mean of the small amplitude is −0.92. The means of all three groups are significantly different.

Correlation analysis was performed between amplitude and arousal, with the specific results shown in Table 5.

Spearman correlation coefficient r = 0.414, p < 0.001. There is a correlation between the two variables.

Therefore, Hypothesis H₄ holds. On NHRs with both structural types, the changes in large, medium, and small amplitudes correspond one-to-one with the recognition results of high, medium, and low arousal levels. The theoretical model proposed by the research has also been validated.

5.5. Discussion

In order to understand whether NHRs guided by two simultaneous constraints can also express different levels of arousal through varying degrees of amplitude, this experiment conducted a simulation using a case study of a NHR.

The study measured participants’ recognition of arousal levels in NHR using the PAD scale and conducted a one-way ANOVA combined with LSD post hoc multiple comparisons and correlation analysis using SPSS Statistics 25 for data analysis. The results showed that in the experimental scenario where two constraints guided the movement of NHR, there were significant differences in the recognition of arousal levels among the three groups with large, medium, and small motion amplitudes, and these differences basically followed the model hypothesis that the size of the amplitude corresponds to the size of the arousal level. This validates the effectiveness and accuracy of the model in describing the relationship between motion amplitude and the recognition of arousal levels.

In this experiment, the correlation coefficient between amplitude and arousal is lower compared to Experiment 1, which may be due to differences in mechanical construction and action patterns. Since the overall range of motion in Experiment 1 (56 cm) is greater than that in Experiment 2 (28.4 cm) for the NHR, the relationship between motion amplitude and arousal may not be as pronounced in Experiment 2 as in the wider range of Experiment 1. These differences could have affected the users’ perceptions and reactions during interactions with the robot, leading to different correlation coefficients. However, overall, both experiments show a correlation between amplitude and arousal. This, to some extent, demonstrates the role of Gestalt theory in human recognition of NHR arousal.

Currently, our research is primarily conducted in simulated environments, which, although helpful for rapid iteration and optimization of models, cannot fully replicate the complexity and uncertainty of the real world. Future research can extend beyond simulated environments to test and validate the applicability and effectiveness of our mathematical models using actual non-humanoid robot prototypes. This includes evaluating the model’s performance under various conditions, such as different environments, user groups, and interactive tasks.

In addition, this study focuses on motion amplitude as the central variable, while the potential roles of speed and acceleration in emotional perception have not been explored. For future research, on one hand, it is possible to discuss the roles of speed and acceleration in emotional perception and determine the specific relationship model between speed, acceleration, and emotional perception. On the other hand, research can explore how to organically combine the three variables of speed, acceleration, and motion amplitude to construct a more comprehensive and precise robot emotional expression model, thereby enhancing the accuracy and effectiveness of emotional conveyance in human–robot interaction.

6. General Discussion

Previous studies on robot action expression, which use machine learning for motion replication, often result in actions that do not fully align with human expectations [35]. These studies focus on using the robot’s own characteristic elements for expression without exploring the rules and mechanisms of expression from the perspective of human cognition. On the one hand, it is difficult to ensure that ordinary people without experience in robotics research can correctly understand the study. On the other hand, the methods used are not replicable and iterative, which is detrimental to the accumulation of knowledge and progress.

Based on the Gestalt theory in psychology, this study proposes a general hypothesis: human recognition of robot emotional actions also stems from the perception of overall motion, independent of the mechanical appearance. It further inquires whether there are potential mechanisms, such as expressible elements that transcend appearance, through which humans can perceive the arousal level expressed by the movements of NHRs. The experimental results demonstrate that amplitude can help manifest arousal not only in NHRs dominated by IK constraints but also in those dominated by FK constraints and in NHRs where both IK and FK constraints play a dominant role. This evidence suggests that Gestalt psychology has certain applicability in the field of robot emotional action expression.

This study introduces Gestalt psychology to analyze human cognitive patterns and applies it to the emotional action expression of NHRs, pioneering a new research pathway. This approach aids NHRs in learning the rules of emotional expression from limited examples, not only enhancing the efficiency of machine learning but also broadening the applicability to a wider range of NHR types. It liberates the subjects of emotional expression from being confined to anthropomorphic or biomimetic robots. The action expression algorithm developed based on human cognitive patterns can enrich and streamline the emotional expression of robots. This innovative application has the potential to bridge the gap between robot emotional expression and human expectations, thus propelling the further development of robot emotional interaction technology. Additionally, it can foster knowledge accumulation and scientific progress, providing a unified foundation for multimodal and interdisciplinary research.

In contrast to visual cues such as facial expressions, voice modulation, and LED color changes, the utilization of motion amplitude for expression possesses distinctive attributes. Facial expressions, while intuitive and readily discernible, are highly effective in conveying complex emotions and social cues but are constrained by the design limitations of NHRs. Motion amplitude, however, is not bound by the robot’s physical appearance and is inherently more noticeable.

Voice modulation, with its ability to communicate a wealth of emotional content through varying pitches, volumes, and rhythms, can be mitigated by environmental noise and may be inappropriate in sound-sensitive settings. Motion amplitude, conversely, offers a more spatial and intuitive form of expression, less susceptible to acoustic interference.

Visual cues like LED color changes can effectively signal emotion and rapidly engage user attention in certain contexts, yet they are limited in expressing intricate emotional states. Motion amplitude, however, can transcend this limitation, showcasing the rich layers of emotions in a more nuanced and dynamic way. Consequently, in the design and execution of emotional expression for NHRs, motion amplitude emerges as a distinctive and effective method.

The findings of this research are poised to make significant contributions in practical settings. For instance, in the domain of service robotics, the discovery that motion amplitude influences users’ emotional perceptions can be leveraged to elevate service quality and user experience. Consider a scenario where a hotel service robot guides guests to their rooms; by modulating its motion amplitude, it can convey emotions of enthusiasm, friendliness, or reassurance, enhancing the perceived thoughtfulness of the service and, in turn, boosting guest satisfaction with the hotel experience.

In the area of healthcare, the application of emotional perception through motion amplitude also has great potential. For patients with psychological illnesses or those undergoing rehabilitation, medical robots can express emotions such as care and encouragement through appropriate motion amplitudes based on the patient’s emotional state and treatment stage. For instance, during rehabilitation training, the robot can guide the patient to actively participate in the training with moderate and vibrant motion amplitudes, inspiring the patient’s confidence and motivation in recovery and assisting medical staff in improving treatment effectiveness.

In the field of education, educational robots can adjust their motion amplitudes according to teaching content and students’ learning states to enhance the fun and attractiveness of teaching. When explaining historical stories, robots can imitate ancient people’s actions with larger motion amplitudes to represent key plot points in historical scenes, helping students better understand and remember the knowledge. During quiet classroom interactions, robots use smaller, gentle motion amplitudes to create a focused learning atmosphere.

However, our focus was on analyzing the correlation between the NHR motion characteristics and the expression of emotional arousal. Since there are significant differences in functionality, structure, and application scenarios among different types of NHRs, there are potential limitations when generalizing our current findings to various NHRs. For instance, the motion of drones is strictly constrained by factors like aerodynamics and flight stability, which may differ from the motion patterns and constraints of ground-based robots. Self-driving cars, on the other hand, are focused on road travel and are influenced by traffic rules, road conditions, and other factors during operation, creating different interaction scenarios and methods compared to our research subject.

Therefore, when applying these findings to different types of NHRs, it is necessary to carefully consider the aforementioned limitations. Future research can further explore the characteristics of these different types of NHRs and how to extend our research findings to a wider range of robotic applications.

7. Conclusions

In the process of human cognition of NHRs, identifying the universal factors that influence the recognition of NHR arousal is crucial. This study, from the perspective of human cognition, utilizes the IK and FK constraint theory used in 3D animation to reconstruct realistic movements, classifies NHRs, and innovatively proposes “amplitude” as a potential universal influencing factor to measure the arousal that NHRs display and humans can recognize. This hypothesis is supported by previous research, which indicates that humans can recognize arousal based on the motion amplitude of IK-dominant NHRs [39,42].

However, for FK constraint-dominant robot movements, especially in continuous actions composed of two opposite directions of movement, the effect of amplitude on arousal has not been clear. To investigate the potential relationship between amplitude and the recognition of arousal, this study meticulously designed two experiments. Experiment 1 focused on the influence of amplitude on arousal in vertical jumping movements of FK-dominant NHRs; Experiment 2 targeted complex NHRs with both IK and FK motion constraints, studying the role of amplitude in the recognition of their arousal. In the experimental design, we used the PAD scale to measure arousal and applied statistical methods to analyze the relationship between the three different amplitudes and the results of arousal recognition, ensuring the reliability and scientific validity of the research.

The research results indicate that in the jumping motion of FK-dominant NHRs, amplitude has a significant impact on the recognition of arousal during the critical phases that influence human perception. Additionally, in complex NHRs constrained by both IK and FK, amplitude also plays a significant role in humans’ recognition of the robots’ arousal. Notably, the relationship between amplitude and arousal shows a high degree of consistency across different types of NHRs, further supporting the view that “amplitude” is a universal factor influencing the recognition of NHR arousal. To further verify the universality of this conclusion, future research should include a wider range of NHR forms and consider differences across different cultures and populations.

The significance of this study in terms of technological advancement, theoretical value, and practical application is as follows:

Technological Advancement: This study, from a universal perspective, delves into how humans recognize the emotional arousal of robots. It not only enriches the application of Gestalt psychology in the field of robot emotional expression but also provides important theoretical support for the practical application of NHRs in emotional interaction. The results of this study contribute to designing more effective emotional expression strategies for NHRs, enhancing the naturalness and efficiency of human–robot interaction in practical applications, and laying the foundation for the further development of robot emotional computing.
Theoretical Value: This study innovatively classifies NHRs through physical motion constraints and deeply analyzes the types of NHR applications and emotional expression issues that previous studies have not touched upon. This innovative work not only fills a gap in academic research but also provides new perspectives and methods for the design and application of NHRs, which is of great significance for promoting the diversification and personalized development of robot technology.
Practical Application: This study’s mastery of the emotional arousal expression rules of NHRs is directly related to the improvement of robots’ expression efficiency and effectiveness. By optimizing motion amplitude based on human potential recognition rules, this study provides precise guidance for the emotional expression of NHRs in different scenarios. This optimized expression method is easier for users to understand and recognize, significantly improving the efficiency of human–robot interaction, and further promoting in-depth cooperation between robots and humans in various fields. It provides practical support for the widespread application of intelligent robots and technological progress in human society.

Future research can further deepen the exploration of human recognition rules, analyzing and refining general elements applicable to the expression of different emotional dimensions. This will help construct a more systematic and comprehensive emotional expression framework, providing precise guidance for the emotional interaction design of NHRs. Additionally, building on this study, we can explore the best fusion strategies for multimodal emotional expression. Through careful design, robots can skillfully combine other sensory expression means, such as sound and light, while using the most basic behavioral actions for emotional expression. This multimodal fusion can not only refine and enrich the content of emotional expression but also enhance the overall efficiency and vividness of expression, achieving more nuanced and three-dimensional emotional communication.

At the same time, ethical issues are a key area that cannot be overlooked in human–robot interaction. For instance, in the potential issue of emotional manipulation, it is necessary to establish effective mechanisms to detect and prevent robots from inappropriately guiding and misleading human emotions to avoid the risks that may arise from robots influencing human behavior through emotional expressions. Future research can explore how to ensure that robots’ emotional expressions are based on genuine information through algorithm design, thus avoiding the transmission of false emotional signals and safeguarding the emotional autonomy of human users.

In terms of trust building, it is important to make the emotional expressions of robots more transparent and interpretable so that human users can understand the logic behind the robots’ behaviors. This will help establish trust based on genuine understanding and reliable interaction, avoiding trust crises caused by misunderstandings.

Unexpected consequences may arise in human–robot interaction, such as the potential social and psychological chain reactions that robot emotional expressions might trigger, including users becoming overly dependent on robots or transferring emotions to human companions. Future research can collaborate with experts in related fields to jointly assess these potential impacts and develop corresponding strategies to minimize risks.

Although the current focus of this study is on exploring the recognition and understanding of human arousal in NHRs, it has not yet delved into the potential recognition and expression mechanisms of other dimensions, nor the development of multimodal expression methods. However, this study undoubtedly opens up a brand new research perspective. Through the cross-analysis of psychology and kinematics, this study reveals how to more effectively enhance the emotional expression ability of robots.

Against the backdrop of the rapid development of robotics and its increasing integration into daily life, the findings of this study not only help non-humanoid robots better provide emotional interaction experiences as they integrate into human society but also blaze a new trail for the development of this field. Therefore, this study plays a significant role in promoting the expansion of application fields, occupying a broader market, and enhancing the value and influence of non-humanoid robots in human life.

Supplementary Materials

The simulation video materials presented in the study are openly available in [OSF.io] at [DOI 10.17605/OSF.IO/BXM8G].

Author Contributions

Conceptualization, Q.X. and D.L.; methodology, Q.X.; software, Q.X.; validation, Q.X. and Z.C.; formal analysis, Z.C.; investigation, Q.X.; resources, Q.X.; writing—original draft preparation, Q.X.; writing—review and editing, Z.C. and D.L.; visualization, Q.X.; supervision, D.L.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62203123 and the Guangzhou Science and Technology Planning Project under Grant 202206030005.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We extend our gratitude to Weidan Sun for her technical support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Khan, N.A.; Hussain, S.; Spratford, W.; Goecke, R.; Kotecha, K.; Jamwal, P.K. Deep Learning-Driven Analysis of a Six-Bar Mechanism for Personalized Gait Rehabilitation. ASME J. Comput. Inf. Sci. Eng. 2025, 25, 011001. [Google Scholar] [CrossRef]
Khan, N.A.; Goyal, T.; Hussain, F.; Jamwal, P.K.; Hussain, S. Transformer-Based Approach for Predicting Transactive Energy in Neurorehabilitation. IEEE Trans. Neural Syst. Rehabil. Eng. 2025, 33, 46–57. [Google Scholar] [CrossRef]
Alverhed, E.; Hellgren, S.; Isaksson, H.; Olsson, L.; Palmqvist, H.; Flodén, J. Autonomous last-mile delivery robots: A literature review. Eur. Transp. Res. Rev. 2024, 16, 4. [Google Scholar] [CrossRef]
Fang, S.; Han, X.; Chen, S. Hotel guest-robot interaction experience: A scale development and validation. J. Hosp. Tour. Manag. 2024, 58, 1–10. [Google Scholar] [CrossRef]
Yang, Q.; Du, X.; Wang, Z.; Meng, Z.; Ma, Z.; Zhang, Q. A review of core agricultural robot technologies for crop productions. Comput. Electron. Agric. 2023, 206, 107701. [Google Scholar] [CrossRef]
Xiao, B.; Chen, C.; Yin, X. Recent advancements of robotics in construction. Autom. Constr. 2022, 144, 104591. [Google Scholar] [CrossRef]
Liu, Z.; Liang, X.; Chen, X.; Wen, X. Design of a sweeping robot based on fuzzy QFD and ARIZ algorithms. Heliyon 2024, 10, 38319. [Google Scholar] [CrossRef]
Bonarini, A. Can my robotic home cleaner be happy? Issues about emotional expression in non-bio-inspired robots. Adapt. Behav. 2016, 24, 335–349. [Google Scholar] [CrossRef]
Cauchard, J.R.; Zhai, K.Y.; Spadafora, M.; Landay, J.A. Emotion Encoding in Human-Drone Interaction. In Proceedings of the Eleventh ACM/IEEE International Conference on Human Robot Interaction, Christchurch, New Zealand, 7–10 March 2016; IEEE Press: New York, NY, USA, 2016; pp. 263–270. [Google Scholar]
Wang, K.; Sang, G.-Y.; Huang, L.-Z.; Li, S.-H.; Guo, J.-W. The Effectiveness of Educational Robots in Improving Learning Outcomes: A Meta-Analysis. Sustainability 2023, 15, 4637. [Google Scholar] [CrossRef]
Dupont, P.E.; Nelson, B.J.; Goldfarb, M.; Hannaford, B.; Menciassi, A.; O’MALLEY, M.K.; Simaan, N.; Valdastri, P.; Yang, G.-Z. A decade retrospective of medical robotics research from 2010 to 2020. Sci. Robot. 2021, 6, eabi8017. [Google Scholar] [CrossRef]
Na, G.; Pa, L.; Keepon, P.; Kozima, H. A playful robot for research, therapy, and entertainment. Int. J. Soc. Robot. 2009, 1, 3–18. [Google Scholar]
Szabóová, M.; Sarnovsky, M.; Kreňáková, V.; Machova, K. Emotion Analysis in Human–Robot Interaction. Electronics 2020, 9, 1761. [Google Scholar] [CrossRef]
Venture, G.; Kulić, D. Robot Expressive Motions: A Survey of Generation and Evaluation Methods. ACM Trans. Hum.-Robot Interact. 2019, 8, 20. [Google Scholar] [CrossRef]
Broekens, J.; Brinkman, W. AffectButton: A method for reliable and valid affective self-report. Int. J. Hum.-Comput. Stud. 2013, 71, 641–667. [Google Scholar] [CrossRef]
Klęczek, K.; Rice, A.; Alimardani, M. Robots as Mental Health Coaches: A Study of Emotional Responses to Technology-Assisted Stress Management Tasks Using Physiological Signals. Sensors 2024, 24, 4032. [Google Scholar] [CrossRef] [PubMed]
Sarah, A.; Jessup, S.T.R. Chapter 22-The Role of Emotions in Human-Robot Interactions; Academic Press: Amsterdam, The Netherlands, 2021. [Google Scholar]
Elliott, M.V.; Johnson, S.L.; Pearlstein, J.G.; Lopez, D.E.M.; Keren, H. Emotion-related impulsivity and risky decision-making: A systematic review and meta-regression. Clin. Psychol. Rev. 2023, 100, 102232. [Google Scholar] [CrossRef] [PubMed]
Law, T.; Leeuw, J.D.; Long, J.H. How Movements of a Non-Humanoid Robot Affect Emotional Perceptions and Trust. Int. J. Soc. Robot. 2020, 13, 1967–1978. [Google Scholar] [CrossRef]
Köhler, W. Gestalt Psychology; The New American Library: New York, NY, USA, 1947. [Google Scholar]
Rudolf, A. Art and Visual Perception; University of California Press: Berkeley, CA, USA, 1974. [Google Scholar]
Plutchik, R. The nature of emotions: Human emotions have deep evolutionary roots. Am. Sci. 2001, 89, 344–350. [Google Scholar] [CrossRef]
Scherer, K.R. What are emotions? And how can they be measured. Soc. Sci. Inf. 2005, 44, 695–792. [Google Scholar] [CrossRef]
Ekman, P. Basic Emotions; John Wiley & Sons: New York, NY, USA, 1999. [Google Scholar]
Plutchik, R. Emotion: A Psychoevolutionary Synthesis; Harper & Row: New York, NY, USA, 1980. [Google Scholar]
Russell, J.A. A Circumplex Model of Affect. J. Personal. Soc. Psychol. 1980, 39, 1161–1178. [Google Scholar] [CrossRef]
Mehrabian, A.; Russell, J.A. An Approach to Environmental Psychology; MIT Press: Cambridge, UK, 1974. [Google Scholar]
Inderbitzin, M.; Väljamäe, A.; Calvo, J.M.B.; Verschure, P.F.M.J.; Bernardet, U. Expression of emotional states during locomotion based on canonical parameters. In Proceedings of the 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), Santa Barbara, CA, USA, 21–25 March 2011; pp. 809–814. [Google Scholar]
Aviezer, H.; Messinger, D.S.; Zangvil, S.; Mattson, W.I.; Gangi, D.N.; Todorov, A. Thrill of Victory or Agony of Defeat? Perceivers Fail to Utilize Information in Facial Movements. Emotion 2015, 15, 791–797. [Google Scholar] [CrossRef] [PubMed]
Bretan, M.; Hoffman, G.; Weinberg, G. Emotionally Expressive Dynamic Physical Behaviors in Robots. Int. J. Hum.-Comput. Stud. 2015, 78, 1–16. [Google Scholar] [CrossRef]
Torre, I.; Goslin, J.; White, L. If your device could smile: People trust happy-sounding artificial agents more. Comput. Hum. Behav. 2020, 105, 106215. [Google Scholar] [CrossRef]
Alemi, O.; Li, W.; Pasquier, P. Affect-expressive movement generation with factored conditional Restricted Boltzmann Machines. In Proceedings of the International Conference on Affective Computing & Intelligent Interaction, Xi’an, China, 21–24 September 2015; pp. 442–448. [Google Scholar]
Hou, S.; Xu, W.; Chai, J.; Wang, C.; Zhang, W.; Chen, Y.; Bao, H.; Wang, Y. A Causal Convolutional Neural Network for Motion Modeling and Synthesis. arXiv 2021, arXiv:2021.12276. [Google Scholar]
Wang, Z.; Mülling, K.; Deisenroth, M.P.; Amor, H.B.; Vogt, D.; Schölkopf, B.; Peters, J. Probabilistic movement modeling for intention inference in human–robot interaction. Int. J. Robot. Res. 2013, 32, 841–858. [Google Scholar] [CrossRef]
Mahadevan, K.; Chien, J.; Brown, N.; Xu, Z.; Parada, C.; Xia, F.; Zeng, A.; Takayama, L.; Sadigh, D. Generative Expressive Robot Behaviors using Large Language Models. In Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, Boulder, CO, USA, 11–15 March 2024; Volume 2401, p. 14673. [Google Scholar]
Wu, J.; Antonova, R.; Kan, A.; Lepert, M.; Zeng, A.; Song, S.; Bohg, J.; Rusinkiewicz, S.; Funkhouser, T. TidyBot: Personalized Robot Assistance with Large Language Models. Auton. Robot. 2023, 47, 1087–1102. [Google Scholar] [CrossRef]
Huang, W.; Xia, F.; Xiao, T.; Chan, H.; Liang, J.; Florence, P.; Zeng, A.; Tompson, J.; Mordatch, I.; Chebotar, Y.; et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In Proceedings of the 6th Conference on Robot Learning, Auckland, New Zealand, 14–18 December 2022; pp. 1769–1782. [Google Scholar]
Du, H.; Herrmann, E.; Sprenger, J.; Fischer, K.; Slusallek, P. Stylistic Locomotion Modeling and Synthesis using Variational Generative Models. In Motion, Interaction and Games; Association for Computing Machinery: Newcastle upon Tyne, UK 2019, 32, 1–10. [Google Scholar]
Xu, J.; Broekens, J.; Hindriks, K.; Neerincx, M.A. Mood contagion of robot body language in human robot interaction. Auton. Agents Multi-Agent Syst. 2015, 29, 1216–1248. [Google Scholar] [CrossRef]
Saerbeck, M.; Bartneck, C. Perception of affect elicited by robot motion. In Proceedings of the Human-Robot Interaction, Osaka, Japan, 2–5 March 2010; pp. 53–60. [Google Scholar]
Prajod, P.; Hindriks, K. On the Expressivity of a Parametric Humanoid Emotion Model. In Proceedings of the 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), Naples, Italy, 31 August–4 September 2020. [Google Scholar]
Xu, J.; Broekens, J.; Hindriks, K.; Neerincx, M.A. Mood expression through parameterized functional behavior of robots. In Proceedings of the 2013 IEEE RO-Man, Gyeongju, Republic of Korea, 26–29 August 2013; pp. 533–540. [Google Scholar]
Claret, J.A.; Venture, G.; Basañez, L. Exploiting the robot kinematic redundancy for emotion conveyance to humans as a lower priority task. Int. J. Soc. Robot. 2017, 9, 277–292. [Google Scholar] [CrossRef]
Hagane, S.; Venture, G. Robotic Manipulator’s Expressive Movements Control Using Kinematic Redundancy. Machines 2022, 10, 1118. [Google Scholar] [CrossRef]
Bhatti, Z.; Shah, A.; Shahidi, F.; Karbasi, M. Forward and Inverse Kinematics Seamless Matching Using Jacobian. arXiv 2014, arXiv:abs/1401.1488. [Google Scholar]
Li, X.M.; Fu, X.L.; Deng, G.F. Preliminary application of the abbreviated PAD emotion scale to Chinese undergraduates. Chin. Ment. Health J. 2008, 22, 327–329. [Google Scholar]
Jiang, N.; Li, R.; Liu, C.; Fang, H. Application of PAD Emotion Model in User Emotional Experience Evaluation. Packag. Eng. 2021, 42, 413–420. [Google Scholar]
Fredrickson, B.L.; Kahneman, D. Duration neglect in retrospective evaluations of affective episodes. J. Personal. Soc. Psychol. 1993, 65, 45–55. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Classification of NHR by kinematics constraints.

Figure 2. Video screenshot of the spider robot’s “jump-phase-dominant group”.

Figure 3. Video screenshot of the spider robot’s “fall-phase-dominant group”.

Figure 4. Comparison of arousal levels of different amplitudes within the two groups. ** p < 0.01, *** p < 0.001.

Figure 5. Comparison of arousal levels between the two groups.

Figure 6. The position of arousal results in the emotional space.

Figure 7. Amplitudes of NHR guided by both IK and FK constrains.

Figure 8. Comparison of differences between different amplitudes. ** p < 0.01, *** p < 0.001.

Table 1. Dimension description phrases for arousal.

Question Number	Descriptive Words
Q1	Wide awake—Sleepy
Q2	Calm—Excited
Q3	Stimulated—Relaxed
Q4	Sluggish—Frenzied

Table 2. Score calculation formula for arousal levels.

Emotional Dimension	Score Calculation Formula
Arousal (A)	A = (−Q1 + Q2 − Q3 + Q4)/4

Table 3. Reference table for 14 basic emotional values [47].

Number	Emotional Types	The Value of Arousal
1	joy	1.21
2	optimism	1.05
3	relaxed	−0.66
4	surprise	1.71
5	gentle	−0.79
6	dependence	−0.81
7	bored	−1.25
8	sadness	0.17
9	fear	1.30
10	anxiety	0.32
11	despise	0.32
12	disgust	0.40
13	anger	1.10
14	enmity	1.00

Table 4. Spearman Correlation Coefficient between Amplitude and Arousal in the Two Groups. **. At the 0.01 level (two-tailed), the correlation is significant.

Direction	Relationship	R Value	p Value
The Jump-Phase-Dominant Group	Amplitude and Arousal	0.187	0.078
The Fall-Phase-Dominant Group	Amplitude and Arousal	0.680 **	<0.001

Table 5. Results related to amplitude and arousal Spearman level. **. At the 0.01 level (two-tailed), the correlation is significant.

Relationship	R Value	p Value
Amplitude and Arousal	0.414 **	<0.001

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xie, Q.; Chen, Z.; Luh, D. How Do Humans Recognize the Motion Arousal of Non-Humanoid Robots? Appl. Sci. 2025, 15, 1887. https://doi.org/10.3390/app15041887

AMA Style

Xie Q, Chen Z, Luh D. How Do Humans Recognize the Motion Arousal of Non-Humanoid Robots? Applied Sciences. 2025; 15(4):1887. https://doi.org/10.3390/app15041887

Chicago/Turabian Style

Xie, Qisi, Zihao Chen, and Dingbang Luh. 2025. "How Do Humans Recognize the Motion Arousal of Non-Humanoid Robots?" Applied Sciences 15, no. 4: 1887. https://doi.org/10.3390/app15041887

APA Style

Xie, Q., Chen, Z., & Luh, D. (2025). How Do Humans Recognize the Motion Arousal of Non-Humanoid Robots? Applied Sciences, 15(4), 1887. https://doi.org/10.3390/app15041887

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

How Do Humans Recognize the Motion Arousal of Non-Humanoid Robots?

Abstract

1. Introduction

2. Related Work

2.1. Emotion and Emotional Models

2.2. Advantages of Action Emotion Expression

2.3. Existing Problems in the Expression of Emotional Movements of Robots

3. Theory and Hypothesis

3.1. Classification and Research Gaps of NHRs

3.2. Hypothesis

4. Experiment One

4.1. Simulation Materials

4.2. Experimental Scale

4.3. Experimental Participants

4.4. Experimental Procedure

4.5. Experimental Results

4.6. Discussion

5. Experiment Two

5.1. Experimental Methods

5.2. Experimental Participants

5.3. Experimental Procedure

5.4. Experimental Results

5.5. Discussion

6. General Discussion

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI