1. Introduction
The study of mobility patterns and the formation of complex contact networks remains a cornerstone in epidemic research, providing important insights into the dynamics of disease spread and informing mitigation strategies and public health policies [
1,
2,
3,
4]. This recognition has been underscored by the global COVID-19 pandemic, where the analysis of contact networks has played a pivotal role in forecasting the virus’s trajectory [
5,
6,
7]. These networks attempt to capture the essence of human interactions, yet they often simplify the granularity of individual movements and encounters through high-level abstractions. The challenge of observing real-world contact networks directly has led to a demand for accurate simulation models that can replicate infection propagation properties of temporal networks observed in different settings.
In advancing our previous work [
8], this paper extends the exploration of micro-level contact modeling by integrating sophisticated human mobility models (HMMs). These models are specifically designed to mimic the movement patterns of individuals within constrained spaces, making them ideally suited for generating temporal contact networks that reflect the nuances of specific locations. Our approach enriches the existing methodologies by utilizing temporal-dynamic networks constructed from observed real-world contacts and applying Bayesian optimization to fine-tune the parameters of our HMMs. This optimization ensures that the generated networks closely emulate infection dynamics of their real-world counterparts.
We build upon our foundation of employing simple micro-level encounter models using synthesized networks, now enhanced by the inclusion of real-world data and advanced analytical techniques, as well as more sophisticated encounter models. This progression allows for a more nuanced understanding of encounter patterns and their implications for epidemic spread. The analysis of topological network features, infection curves, and the interpretation of optimized hyperparameters represent significant parts of our methodology. These advancements can improve the accuracy of epidemic forcasting by integrating location-specific micro-level infection characteristics into large-scale infection spreading simulations. Notably, existing pandemic simulation models, such as OpenABM [
9], Covasim [
10], and Memillo [
11], oversimplify infection propagation principles when modeling contacts and infection transmissions in confined spaces. Our approach accounts for the unique nuances and characteristics of different confined environments, enabling improved epidemic forecasting simulations, and, therefore, can support investigations on public health interventions and contact-tracing efforts. The key contributions of our paper include the following:
The generation of temporal-dynamic networks that are based on a range of micro-level encounter models. This includes advanced HMMs designed to simulate infection properties as well as network characteristics in confined locations with high fidelity, alongside simpler models.
The introduction of Bayesian optimization for hyperparameter selection in HMMs, a novel approach aiming at generating temporal-dynamic networks for confined spaces. This strategy focuses on accurately replicating the infection propagation dynamics observed in real-world contact networks, serving as a cornerstone in enhancing the realism and relevance of large-scale epidemic simulations.
The employment of various network metrics for both the optimization of our models and their comprehensive evaluation, coupled with a thorough analysis that demonstrates the capability to effectively parameterize HMMs using real-world network data. This integrated approach not only validates the effectiveness of our networks in mimicking real-world phenomena but also identifies certain models as particularly well-suited for specific types of locations.
The structure of this paper is organized as follows:
Section 2 sheds light on existing methods employed in micro-level encounter modeling and explores HMMs as temporal-dynamic networks. Following this, the methodology section,
Section 3, then proceeds to introduce our approaches to the modeling of micro-level contacts as well as the data used in our experiments. The results section,
Section 4, compares the outcomes of various techniques we have employed in the capability of generating contact networks with realistic infection propagation properties. We discuss our findings, the limitations of our approaches, as well as the potential use for future pandemic modeling in
Section 5. Finally,
Section 6 summarizes our work and provides an outlook on future work.
3. Methodology
This section outlines the core methodologies that support our study of micro-level interaction modeling via temporal-dynamic networks. Initially, the infection propagation model used throughout this study as well as the real-world network data are presented. We then proceed to describe methods for modeling micro-level contacts. This spectrum includes simplistic, naive approaches as foundational baselines, alongside more complex techniques that make use of HMMs.
3.1. Susceptible–Infectious–Recovered Model
The overall goal of this study is the generation of realistic contact networks for confined spaces, which reflect the infection propagation properties of their real-world counterparts. For the parameterization of the HMMs used in this study, as well as the comparison of resulting contact networks, we employed the Susceptible–Infectious–Recovered (SIR) model [
25]. The SIR model is a well-established compartmental model used to analyze the spread of infectious diseases within a population. It divides individuals into three compartments: susceptible (S), infectious (I), and recovered (R). The SIR model tracks the transitions of individuals between these compartments based on their interactions and the disease’s transmission dynamics. For our evaluation, we utilized a temporal-dynamic SIR model implemented using the Tacoma framework (
https://github.com/benmaier/tacoma, accessed on 30 July 2024). Tacoma provides a versatile platform for studying epidemic spreading and other dynamical processes on networks utilizing the Gillepsie algorithm [
26]. We let the epidemic spreading simulations run for a simulated period of 35 artificial days. During this time, we monitored the progression of the infection within the population and observed how different modeling approaches influenced the spread of the disease. This SIR-based evaluation allowed us to gain insights into the impact of micro-level encounter modeling on the topological properties of contact networks and the resulting epidemic dynamics.
To ensure adequate infection dynamics in our simulations, we set a uniform recovery rate across all networks, with days, indicating that an infected node typically recovers within a 7-day period. By selecting values that would achieve roughly 20% of infectious nodes at the infection peak, we could maintain a robust dynamic across the experiments. The chosen values for the networks in our study—namely, high school, primary school, office, and supermarket—were , , , and , respectively. Using varied transmission rates allowed us to simulate realistic infection dynamics by accounting for the distinct nature of each network while ensuring that the desired infection peak was met.
3.2. Dynamic Network and Mobility Data
In this work, we utilized two distinct data sources to inform and test our models: temporal-dynamic networks from socio-patterns and supermarket mobility data. We will now detail these sources and discuss any associated technical limitations. It is important to note that, to conduct comprehensive SIR simulations across several days, we addressed the challenge posed by the availability of accurate long-term mobility data. Our approach involved stacking the temporal contact network data from single days to simulate a continuous span of 35 days. This step was necessary to ensure a fair comparison between different locations, as the original period of the temporal network datasets used in this study varied heavily. In environments where individuals typically have assigned, consistent interactions (e.g., most offices and schools), this method provides a reasonable approximation. While this approach may not fully capture long-term fluctuations, it still allowed us to create contact networks with a certain extent in infection dynamics and identify infection potential specific to confined spaces. Conversely, in locations with high variability in individual movement and interactions, such as supermarkets, the infection dynamics likely deviate more from actual dynamics. We discuss this further in the context of our supermarket network below.
3.2.1. Socio-Patterns Network Data
We incorporated three real-world datasets for specific settings: a high school network [
27], a primary school network [
28], and an office network [
29]. These temporal-dynamic networks consist of interactions between individuals over different periods. The datasets do not record exact arrival, departure, and duration times; these must be inferred from the timing of the first and last interactions of each node. This means that, for each person, the arrival time is marked by the occurrence of the first edge, and the departure time is set by when their last interaction ends. Due to the experimental setup noted in these studies, the data capture only face-to-face contacts that occur within a range of up to 1.5 m. Our analysis concentrated exclusively on the recorded physical encounters included in the datasets. We adopted the temporal resolution of 20 s given by the empirical networks. For the purpose of this study, we will refer to
real-world networks as networks which have been constructed from observed real-world [
27,
28,
29].
3.2.2. Supermarket Network Data
In [
30], encounters of supermarket visitors were recorded during the COVID-19 pandemic. In addition to encounters, the exact arrival and departure times are provided as well. The data encompass approximately five hours of encounters.
In contrast to the socio-patterns datasets, the supermarket dataset was created using ultra-wideband technology instead of radio devices, providing significantly higher sensitivity. Despite this improvement, individuals still wore the devices in front of their bodies, leading to potential limitations in detecting interactions when people were positioned behind one another. As a result, we assumed a similar field of view as in the other datasets, indicating that the devices can detect encounters within this range. To align with the temporal resolution of the socio-patterns networks, we aggregated the contacts of the supermarket network into 20-s intervals.
In our study, which examines the spread of infections within confined spaces, supermarkets were not ideal locations for analysis due to the infrequent visits by individuals, which inhibits the initiation of infection dynamics. To mitigate this limitation, we also used receiver IDs assigned to participants during the experiment to keep track of nodes over multiple days (as explained above, we stacked single days of network data to longer time spans). This approach allowed us to utilize the supermarket network as a proxy for locations that attract the same group of visitors but exhibit highly random movement patterns compared to schools or offices. In general, locations with infrequent individual appearances are not conducive to modeling the propagation of infections over extended durations, and are, therefore, more effectively integrated into large-scale simulation frameworks. Nevertheless, our analysis benefitted from including the supermarket case, as it introduced a distinct nature of interactions compared to socio-patterns networks, allowing us to explore the adaptability and optimization of our models for differently characterized networks.
3.3. Micro-Level Contact Modeling
This section introduces various micro-level contact models. Initially, we recapitulate naive approaches from our prior work [
8], before advancing to innovative contact modeling techniques, which leverage HMMs and Bayesian optimization.
While aggregated mobility data may be available, detailed micro-level movements are typically not available, leaving a gap in accurately modeling the nuanced interaction patterns that influence infection dynamics. By bridging this gap, micro-level models can enhance the precision of infection forecasting. In the context of this study, micro-level encounter modeling aims to generate temporal networks that reflect infection dynamics and certain network properties of observed real-world temporal networks. All models discussed are designed to input arrival and departure times, producing temporal contact networks that, while maintaining consistent overall node counts, differ in the number and duration of edges due to varying movement patterns.
Real-world datasets used in this study offered an actual temporal network as a baseline for validation, essential for the parameterization of HMMs. As already pointed out, we inferred arrival and departure times in these cases based on a node i’s initial and final edge appearance, assuming that nodes were not leaving the location between contacts.
Surprisingly, a significant number of brief edge durations could be observed when nodes that have been isolated for longer periods, such as two hours, were completely removed from the network. Apparently, nodes spent a considerable time without any contact recorded, even in school environments. This is likely due to the face-to-face nature of the empirical networks. In [
31], a similar experimental-technical infrastructure was employed as in the empirical networks used in this study, but focusing on temporal networks at scientific conferences. The study found that contacts were rare during presentations, even in crowded rooms, because attendees generally do not face each other. This implies that, in other environments, such as offices and high schools, close proximity alone does not guarantee that contacts are recorded by radio devices. To ensure that our models accurately represented this aspect, we chose to model nodes continuously, from their first appearance to their last in the empirical network, without removal, even if they showed no contacts for prolonged periods.
3.3.1. Naive Micro-Level Encounter Models
Baseline approach (BASE): our baseline approach delivers the most simplistic and intuitive way to build contact networks from the arrival and departure times of individuals at certain locations. A similar approach was described by [
12]. In essence, this method leverages mobility data and individual-specific time allocations at specific locations to compute intersecting time frames between individuals, subsequently constructing contact networks. Individuals present at the same location are linked by edges in a contact network. Transforming this concept into a temporal-dynamic network, we established edges connecting pairs of individuals who coincide at a given point in time within the same location (see
Section 2.3). Under this premise, our approach assumes an equal likelihood of infection for any pair of individuals who share the same duration of stay at a location. In other words
BASE constructs a fully connected network between all nodes active at time
t. Additionally the contact intensity
w influences the weight of an edge. This parameter
w is assumed to be constant for all edges and is determined by the type of location, e.g., locations with a lot of social interactions like kindergartens or cafes are assumed to have a higher
w value than libraries or supermarkets. A noticeable difference between our approach and the approach suggested by [
12] is that, in our case, nodes are always connected to all other active nodes, while, in their case, the number of contacts was capped at 20. While, in big locations, both approaches generate very different networks, they are identical for small locations where the number of nodes stays below 20 for the majority of the time. This simplified framework formed the foundation of our exploration, serving as a reference point against which we compared our more intricate modeling techniques.
Random graph-based approach (RAND): in our random graph-based approach, similar to [
9], every possible edge, meaning any nodes
i and
j present at the location at time
t, is selected with probability
. Additionally, a contact duration is drawn from a Pareto distribution
. Contacts, therefore, have a minimum duration of a one-time step and follow a power law determined by the shape parameter
. This distribution accounts for the variable nature of interaction durations, resulting in a dynamic and realistic representation of human encounters when the recurrence of contacts is completely random. A possible application would be in locations where interactions form mainly due to uncorrelated movement instead of social relations, like in supermarkets, where the case of two individuals being nearby for the entire shopping trip is rather unlikely; however, frequent but short contacts are to be expected.
Clique-based approach (CLI): we advanced the clique-based strategy of [
13] for the purpose of micro-level encounter modeling. To construct cliques, we utilized a combination of spatial dynamics and contact patterns. First, individuals are assigned to sub-spaces within the location, with the parameterizable
mirroring the number of individuals per space. Whenever a node is introduced into the location, it stays in its space and forms connections to all nodes present in this space, forming tight cliques. To allow contacts between cliques at every time step, a node changes its space with probability
for a duration that is drawn from a normal distribution
. Afterward, the node returns to its default space.
Clique-based approach with random substructure (CLI+RAND): cliques from CLI can be imagined as classrooms, offices, or apartments in residential buildings. Large values for will generate a high number of contacts, e.g., a classroom with 30 students already generates 435 edges, at every time frame, due to their fully connected nature. To lower the density of the clique networks and allow for edge changes, we added our RAND approach to sit on top of the clique structure. Contacts within cliques are now randomly sampled according to the procedure explained in RAND. This leads to the formation of cliques with an adjustable density, where individuals have pronounced edges connecting them within the clique, reflecting intensive interactions. In contrast, connections outside the clique are rare, mirroring more sporadic or distant interactions. The underlying idea of this approach is to encapsulate the nuanced interplay between spatial arrangements and interpersonal encounters.
3.3.2. Human Mobility-Based Micro-Level Encounter Models
To build temporal contact networks from HMMs, we used an open-source implementation of
and
(
https://github.com/panisson/pymobility, accessed on 30 July 2024) that follows the model description provided in
Section 2. The
model was integrated with
to form the combined
+
model, as detailed in
Section 2.2. Since all models need a confined area for nodes to walk in, we built synthetic locations according to our empirical networks. We therefore inferred the capacity
C of each location. For the primary school, the high school, and the office networks, we assumed the capacity of the location to be equal to the number of participants in the experiment. The capacity for the supermarket network was defined as the peak number of active nodes across all time steps, which resulted in a capacity of 44. To determine the area based on a location’s capacity, we utilized values for location-dependent space per person from [
13], denoted as
. We developed a quadratic surface based on these values, calculating the area
A as
. For the
-based models, we additionally sub-structured the area into
sub-spaces in the horizontal and vertical direction, where
denotes the number of nodes (see
Table 1). This ensures a uniform distribution of sub-spaces across the entire area, resulting in some sub-spaces potentially containing fewer nodes than the default
.
In all HMM-based approaches, we continuously tracked node movements during the simulation. A contact between two nodes
i and
j is generated if both contact conditions
are fulfilled. Cond.1 ensures that, for nodes positioned at
and
, the distance between them must be smaller than a specified maximum distance
to result in an edge. To calculate distances between all nodes at every time step, we used the well-known kd-tree algorithm, utilizing the standard Euclidean distance. Secondly, we wanted to determine whether the lines of sight of both nodes aligned or not, accounting for the experimental design outlined in
Section 3.2.1. Therefore, we defined their line of sight vectors,
and
, which are always parallel to the latest movement. For all nodes that pass condition Cond.1, the angles between their lines of sight and the connecting vector
are calculated. If those angles are smaller or equal to one-half of their field of view
, then conditions Cond.2a and Cond.2b are fulfilled, and a contact between nodes
i and
j is generated. Following the conditions from the original studies outlined in
Section 3.2, we set the field of view to 120 degrees, a typical value for the human binocular field of view, and the maximum contact distance to 1.5 m. The timescale of our simulation was one second. To align with the timescale of the empirical networks, we established a contact if nodes met at least once within a 20-s window. Consequently, a contact was considered ended if nodes lost contact for at least 20 s.
3.3.3. Bayesian Optimzation for Hyperparameter Selection
To perform hyperparameter optimization, we used the Optuna framework described in [
32]. Optuna employs advanced Bayesian optimization algorithms to identify a set of parameters from a specified search space that minimizes a designated objective function. A comprehensive table detailing all tuned parameters and their respective ranges can be found in
Appendix A.1. To evaluate the error generated by a specific model, we focused on metrics that assess the similarity between the infection dynamics of the empirical network
and the modeled network
. The infection dynamics were calculated as outlined in
Section 3.1. Here,
and
denote the number of infected nodes for
and
, respectively. We measured both the difference in infection peaks, denoted as
, and the timing difference of these peaks, expressed as
, using the following formula:
Such that a model achieving a low value for
closely replicates the peak number of infections observed in the empirical network. When a network yields a small value for
, it suggests that the timing of the peaks in both the model and the empirical network align closely, demonstrating that the model effectively captures the empirical infection dynamics. In general, topologically different networks can generate very similar infection dynamics. To address this, we also took into account the overall number of edges generated as well as the similarity in the contact duration distributions. The relative difference in the total number of edges between two networks is defined as
where
is calculated using
To assess the similarity between two contact duration distributions, we utilized the well-known Kolmogorov–Smirnov (KS) test. The KS test quantitatively determines if two underlying one-dimensional probability distributions differ significantly. We computed the difference in the contact duration distribution with
After conducting explorative experiments with various weights and observing the typical value spectra for each metric, we defined the objective function
with the following weights:
All models, except for BASE, in our framework, are stochastic. Consequently, the outcome of not only depends on the parameters selected from the search space but also varies around a mean value. Additionally, SIR runs are stochastic and need to be executed multiple times. To balance the stochastic variations of the SIR runs and the network construction, we generated 20 network realizations for a given set of parameters and conducted 250 SIR runs per network, a number of runs that resulted in stable infection peaks during our tests. The mean value of all values (1 for each network realization) was considered as the objective function value for that trial. For each model, a total number of 150 trials was computed by scanning the search space for the optimum parameter set.
Due to the stochastic nature of the model, the final parameter set can still yield slight differences in the final objective function value. To account for this, we conducted a final test with 21 networks generated using the optimal parameters. The definitive value for
was determined as the median among these 21 networks. All model evaluations within our experiments were applied to these resulting median networks, including final SIR runs with
iterations.
Appendix A.1 lists all the model parameters used in the Bayesian optimization process.
4. Results
This section presents findings from experiments that examined the impact of different contact patterns on infection dynamics across various scenarios. We used HMMs and optimized them according to our proposed methodology. This approach enabled us to construct contact networks that replicate infection dynamics and network characteristics observed in real-world settings. The results were then used to compare the cost values for different models, with correlations drawn to the corresponding SIR dynamics. We also analyzed the network properties and outlined the parameters derived from the optimization process.
Following the methodologies described in
Section 3.3.2 and
Section 3.3.3, we fine-tuned the parameters of HMMs. We performed SIR simulations, running a total of ten times the number of nodes for each network, to ensure statistically robust outcomes. We used the high school, primary school, office, and supermarket networks, introduced in
Section 3.2.1 and
Section 3.2.2.
Figure 1 presents the costs associated with each method as derived from the objective optimization function. The
STEPS and
STEPS+RWP approaches consistently achieve the lowest costs across most types of locations, followed by
CLI+RAND and
RAND. Notably, for
TLW,
RWP, and
BASE, performance varies significantly with location type. For instance,
TLW achieves costs lower than 1.5 for the supermarket location but exceeds a cost of 10 at other locations. This variability highlights the location-dependent performance of models, a topic we will explore further in
Section 5. Overall, the high school network presents the greatest challenge for the tested models, followed by the office network. Conversely, the primary school and supermarket networks yield the lowest costs across all tested models.
Although inherent randomness in both the SIR evaluations and our network models may lead to deviations from the median in the final 21 runs, the results demonstrate reliable consistency. This consistency is evidenced by the 95% confidence intervals, which do not exceed 13% above or below the median value. This indicates that our model’s performance is stable, as 95% of the expected cost values fall within these error bars shown in
Figure 1.
Infection dynamics are driven by the transmission probability parameter
and recovery probability parameter
. To verify that our top model,
STEPS, performs well irrespective of these parameters, we examined its performance across various
and
combinations. Specifically, we tested seven different
values for each location, combining each
value with three
values (
4 days,
7 days, and
10 days) to reflect recovery on average after 4, 7, and 10 days. Observing only minimal fluctuations in
, the results showed good adaptability to different
and
values. For detailed results, see
Appendix A.4.
Table 2 details the parameters of the
STEPS and
STEPS+RWP models, as discussed in
Section 3.3.2, optimized using the Bayesian optimization strategy detailed in
Section 3.3.3. For the high school network, the
STEPS model yields an
value of 27, compared to 39 for the primary school, suggesting a higher density of individuals per unit space in the primary school. For the
STEPS+RWP model, the
for the high school is 21, while it is 22 for the primary school, showing no pronounced difference.
The attractor strength k for the primary school is greater in the case of the STEPS model, showing a value of 9.974 for the primary school and 4.387 for the high school, while for STEPS+RWP, the primary school has an attractor strength of 7.870 compared to 8.128 for the high school. While, for STEPS+RWP, again, no strong difference is observable, the attractor strength of STEPS suggests that individuals in primary schools are more tightly bound to specific spaces and less likely to change the space compared to those in high schools. Besides and k, the models show similarities in the value, which determines the shape of the Pareto distribution for pause times. A larger will result in more movement, as the pause times between movements are decreased. As a consequence, the higher the value, the more short-term contacts occur. Both models show higher values for the primary school compared to the high school. This indicates that the likelihood of short-term contacts is somewhat greater in the primary school setting. The STEPS model, especially, which shows the lowest cost for most locations, reflects the distinct characteristics of high schools and primary schools by accounting for shorter, more frequent contacts in the primary school, along with a tighter adherence of primary school children to their default spaces.
In the supermarket scenario, the values for both STEPS and STEPS+RWP are on a similar scale, with 20 for STEPS and 24 for STEPS+RWP. However, the attractor strengths between the two models differ strongly. STEPS has an attractor strength of 2.387, while STEPS+RWP uses a much stronger value of 9.161. Despite this, the values are similarly high, with STEPS at 2.887 and STEPS+RWP at 2.172. Furthermore, the parameter , used only in STEPS+RWP, is considerably lower in the supermarket scenario (24 s) compared to other locations, where it exceeds 1000 s. This constant movement causes individuals to have very short inter-contact periods, while also experiencing very brief contacts. Besides , clearly exceeds for the supermarket compared to the other locations. With for STEPS, and for STEPS+RWP, both models introduce a high number of short-term contacts. In contrast, both models show values ranging from 0.1 to 0.8 for all other locations. These differences align with the nature of the underlying location characteristics: supermarkets are characterized by frequent short-term encounters, while schools and offices tend to have longer-lasting interactions. In the office network, the k values are relatively low, with 2.881 for STEPS and 5.120 for STEPS+RWP. The parameter is notably higher in this setting for STEPS, at 0.768, compared to other locations. This suggests that the STEPS approach indicates a higher frequency of short-term contacts in offices than in school environments, but significantly less than those observed in supermarket scenarios. The STEPS+RWP model has an value of 0.346, positioning it between the values found in high schools and primary schools.
To explore the correlation between the computed costs and the SIR curves generated by the parameterized models,
Figure 2 illustrates the SIR curves for the temporal contact networks derived from various encounter models. Each subplot’s legend indicates the corresponding cost for each model. Models with a cost of up to 1 demonstrate precise replication of infection dynamics in terms of both timing and extent, e.g.,
STEPS in the primary school network scenario. A cost ranging from 1 to 2 still indicates some similarity to the infection propagation properties of the real-world network, such as
STEPS in the case of the office network, yet deviations from the baseline infection dynamics can be observed. Costs exceeding higher values tend to result in infection dynamics that significantly diverge from real-world dynamics, as observed for
RWP in the high school scenario. Nonetheless, our methodology proved effective, as we were able to apply Bayesian optimization and HMMs to deploy encounter models that generate temporal networks reflecting properties of realistic, location-specific infection dynamics.
Figure 3 illustrates the outcomes of applying the parameterized models
STEPS,
STEPS+RWP, and
RAND to the four selected locations, focusing on not only SIR curves but also the probability density functions of contact durations and edge counts in both the generated and real-world networks. Contact durations and edge counts for all models can be found in
Appendix A.2 and
Appendix A.3. The
STEPS approach successfully produces temporal contact networks that emulate real-world SIR curves, with the greatest deviations observed for the office network. The outcome for
STEPS+RWP is comparable, though it also shows a significant deviation of the SIR curve and contact durations in the high school case. The
RAND approach mostly captures the temporal peak in infection but results in a higher number of infections across most locations. In fact, the
RAND approach-based networks tend to underestimate edge counts but result in a network topology associated with higher infection dynamics than the real-world counterpart. This, again, underscores the critical role of network topology in shaping SIR dynamics, beyond mere connectivity levels.
Although our models do not yet account for the exact temporal distributions of edge counts (as observed in
Figure 3), they can already produce realistic infection dynamic properties. We anticipate that incorporating additional network measures will further improve our modeling capabilities. An analysis of this limitation is provided in the discussion section. Notably, in the supermarket scenario, where the arrival and departure times of each individual were available (see
Section 3.2.2), our models successfully reflected the temporal aspects of the edge count distribution.
In regard to the contact duration distributions, the office and supermarket networks especially show the greatest deviations from the ground truth. STEPS manages to replicate these distributions for the primary school and the high school locations to a large extent. Interestingly, while the RAND model generally underestimates the edge counts, the contact duration distributions are varying, with an overestimation of contact durations in the supermarket case, and a strong underestimation for the high school.
5. Discussion
The results of our study demonstrate that employing HMMs and Bayesian optimization can effectively create dynamic temporal networks that mirror infection dynamics and certain network characteristics of temporal-dynamic networks constructed from observed real-world contacts. The high degree of interpretability of the optimized hyperparameters, coupled with the ability to control model parameters, underscores the robustness and utility of our approach. By incorporating detailed, micro-level encounter data, our methodology contributes significantly to enhancing the reliability and precision of infection forecasting models. This is particularly crucial for improving responses and strategies in future pandemics, ensuring that interventions are both timely and based on robust, data-driven insights. Our results not only validate the effectiveness of our modeling approach but also highlight the importance of granular contact networks for accurately predicting the spread of infectious diseases across various locations. Overall, the availability of fast, simple, and interpretable models is essential for rapid response in pandemic situations, a need our study addresses. These models can easily be parameterized and quickly deployed, providing effective solutions, even when data are scarce.
The examination of the optimized parameters, detailed in
Table 2, allows for interpretability. One notable observation is that the model incorporates parameters that align with real-world characteristics of different locations. For instance, primary schools typically have fixed classrooms, while high schools often feature dedicated rooms for specific subjects. Additionally, the likelihood of interaction with individuals from other classes might be higher, particularly in courses like language classes where class assignments may vary. This tendency is reflected for all parameters, the attractor strength
k, the Pareto distribution shape value
, and the
value. Nevertheless, it is conceivable that a different set of parameters could theoretically produce similar infection dynamics. Furthermore, this interpretation is highly driven by the characteristics of the reference locations used, which, in our case, are based in western countries. Thus, to achieve a deeper understanding and validation, further evaluation and additional experiments are essential.
As discussed in
Section 4, the model representation of a supermarket exhibited a significant difference in the
k parameter between simulations using
STEPS with and without the RWP component. Specifically, while
STEPS modeled the supermarket with a
k of 2.387,
STEPS+RWP utilized a value of 9.161. The likelihood of individuals staying within their default space is, therefore, higher with the
STEPS+RWP model compared to
STEPS. This generally creates a more distinct clique structure, where nodes in the same space are more likely to connect. However, both models seem to address the nature of short-term contacts by setting the parameter
so high that all individuals are effectively in constant motion. This approach leads to frequent but short-term contacts. In this case, it does not matter whether these contacts occur within a single space or across different spaces, as both models can create temporal network topologies that reflect the properties of the ground truth network. Nevertheless, all interpretations must be approached with caution due to the high degree of interdependence among the various parameters.
When comparing edge counts generated by different models across various locations to the real-world counterparts, the supermarket scenario exhibited the highest level of similarity. This is attributed to the availability of precise arrival and departure times, as highlighted in
Section 3.2.2. The accuracy of this information is significant because individuals can only encounter each other when they are present at the same location. For the high school, primary school, and office networks, however, the arrival and departure times are inferred from the first and last edges in the data. Conversely, in the supermarket scenario, knowing the exact number of individuals who might meet leads to more accurate modeling of edge counts over time. We assume that the availability of exact arrival and departure times would enhance the performance for other real-world networks, resulting in more consistent network characteristics and SIR outcomes.
The RWP and TLW mobility models generally align with the concept of random movement, where individuals randomly select destinations, move towards them, and then pause for varying durations before repeating the process. This characteristic can explain why these models delivered moderate results in our experiments with the supermarket network, yet failed to capture the dynamics of other real-world networks like offices, high schools, and primary schools. In the supermarket scenario, individuals often move randomly between aisles, making the RWP and TLW models somewhat effective at capturing these patterns. However, these models lack the concept of attachment to specific locations, a key feature in environments like offices, high schools, and primary schools, where people tend to stay in defined areas for extended periods. The STEPS approach, which emphasizes a stronger attachment to certain spaces within a location, better represents these scenarios. Regarding the cost function, the STEPS+RWP model showed optimal performance in the supermarket network, highlighted by a close match in network properties and SIR curves. However, although the model managed the primary school network adequately, it struggled to accurately simulate the high school network. On the one hand, the model demonstrated strong performance in two very different locations: the primary school and the supermarket. This adaptability can be attributed to its blend of a random component and a clique-emerging component, which is driven by individuals being tied to default spaces. On the other hand, its performance significantly declined in the high school setting, highlighting limitations in its adaptability. The specific characteristics of our models, particularly the role of the RWP component in STEPS+RWP, will be the subject of future investigations.
While our study has provided insights into the behavior and characteristics of temporal contact networks, limitations need to be acknowledged. Our current approach of temporarily stacking networks to represent extended time periods does not accurately capture the long-term dynamics of infection spread. However, we believe that our methodology remains valuable for providing insights and deepening our understanding of how confined spaces influence infection dynamics. To improve the accuracy of our models, future studies will need real-world contact data that covers longer periods.
Our parameter optimization strategy, described in
Section 3.3.3, aims to replicate SIR properties, contact durations, and edge counts. Future work should also address the temporal distribution of edge occurrences to accurately capture typical events in environments like high schools, offices, or supermarkets (such as rush hours or lunch breaks). Beyond incorporating exact departure times and specific temporal events, future models could also benefit from integrating additional human mobility frameworks that more closely represent the complex behaviors and interactions found in these environments.
6. Conclusions
This paper has presented a comprehensive approach to modeling micro-level contact networks through human mobility models, focusing on refining the realism and fidelity of infection spreading in temporal-dynamic networks. By integrating Bayesian optimization for hyperparameter tuning and utilizing network metrics, we have demonstrated the potential of our approach in replicating the infection propagation characteristics of contact networks. By integrating the nuances of different confined spaces, our work can contribute to the overall quality of pandemic simulations and improve the reliability of forecasting models.
The discussion has highlighted the strengths of our methodology, including the capability to optimize HMM parameters using real-world network data, an analysis of network metrics, and the interpretation of optimized model parameters. These advancements pave the way for a more nuanced understanding of how different micro-level encounter models impact the spread of infectious diseases. However, the study also acknowledged limitations, such as the constrained scope of our experiments and the need for broader validation across diverse locations and scenarios. Besides expanding the dataset, exploring additional models, and integrating our approach into larger-scale epidemic simulations with multiple locations, future work could incorporate temporal events specific to certain location types, such as lunch times in offices and peak hours in supermarkets, to better reflect real-world dynamics.
In conclusion, this paper contributes to the field of epidemiological modeling by offering a framework for generating contact networks that align with infection propagation characteristics observed in temporal contact networks constructed from real-world contacts. Our model can be particularly useful for regions or countries where such detailed data are unavailable, providing valuable insights through simulated scenarios. Our work fills a gap by providing a method to model infection dynamics in confined spaces, thereby supporting larger-scale epidemic simulations and forecasting models.