Next Article in Journal
Automatic Neural Architecture Search Based on an Estimation of Distribution Algorithm for Binary Classification of Image Databases
Previous Article in Journal
Using Machine Learning for the Precise Experimental Modeling of Catastrophe Phenomena: Taking the Establishment of an Experimental Mathematical Model of a Cusp-Type Catastrophe for the Zeeman Catastrophe Machine as an Example
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Macro-Scale Temporal Attenuation for Electoral Forecasting: A Retrospective Study on Recent Elections

by
Alexandru Topîrceanu
Department of Computer and Information Technology, Politehnica University Timişoara, 300006 Timişoara, Romania
Mathematics 2025, 13(4), 604; https://doi.org/10.3390/math13040604
Submission received: 10 January 2025 / Revised: 31 January 2025 / Accepted: 10 February 2025 / Published: 12 February 2025
(This article belongs to the Special Issue Advances in Multi-Criteria Decision Making Methods with Applications)

Abstract

:
Forecasting election outcomes is a complex scientific challenge with notable societal implications. Existing approaches often combine statistical analysis, machine learning, and economic indicators. However, research in network science has emphasized the importance of temporal factors in the dissemination of opinions. This study presents a macro-scale temporal attenuation (TA) model, which integrates micro-scale opinion dynamics and temporal epidemic theories to enhance forecasting accuracy using pre-election poll data. The findings suggest that the timing of opinion polls significantly influences opinion fluctuations, particularly as election dates approach. Opinion “pulse” is modeled as a temporal function that increases with new poll inputs and declines during stable periods. Two practical variants of the TA model, ETA and PTA, were tested on datasets from ten elections held between 2020 and 2024 around the world. The results indicate that the TA model outperformed several statistical methods, ARIMA models, and best pollster predictions (BPPs) in six out of ten elections. The two TA implementations achieved an average forecasting error of 6.92–6.95 percentage points across all datasets, compared to 7.65 points for BPP and 14.42 points for other statistical methods, demonstrating a performance improvement of 10–83%. Additionally, the TA methods maintained robust performance even with limited poll availability. As global pre-election survey data become more accessible, the TA model is expected to serve as a valuable complement to advanced election-forecasting techniques.

1. Introduction

Electoral forecasting represents a significant scientific topic in our increasingly interconnected society and attracts considerable attention across various research disciplines [1,2,3]. Historically, forecasting election results has relied on traditional statistical models that assess opinion polls before election day [4,5]. Since the late 1970s, the strategic selection of the election date has been recognized as potentially impacting the results [5]. In presidential elections, forecasting uses macromodels [6], which are statistical models reflecting national economic and political changes. Conversely, micromodels are derived from individual voter surveys conducted during the pre-election phase [7].
Today, forecasting election results benefits from modern techniques such as social media analysis and data science [8,9,10]. Advanced forecasting now often incorporates multilevel regression and post-stratification (MRP) [11,12,13]. Prominent forecasting institutions, especially present in the United States, such as Real Clear Politics, FiveThirtyEight, and Understanding America Study analyze election data using MRP variants. A concise overview of the methodologies utilized by these pollsters implies the following steps:
  • The polls are weighted/averaged according to the pollster’s reliability.
  • The polls are adjusted based on factors such as anticipated voter turnout, convention effects, and exclusion of third-party candidates.
  • Incorporating economic and demographic data applied to scale surveys at the state and national tiers.
  • The simulation employs probabilistic distributions to represent data uncertainty.
Alongside MRP, auto-regressive integrated moving-average (ARIMA) models applied to daily tweets have been successfully used to forecast the UK General Election [14] and Brexit using the daily fluctuations of the British Pound [15]. Studies also suggest that alternative subjective surveying methods can be effective predictors. For instance, the American National Election Surveys from 1956 to 1996 indicate that voters were more capable of predicting presidential election winners [7]. Another study found that quick, instinctive facial assessments of political candidates better predicted election results than detailed competence evaluations [16]. Therefore, voter expectation-based prediction models present a promising substitute for traditional statistical methods [17].
We highlight notable research gaps in current electoral forecasting related to temporal factors. First, there is limited integration of temporal dynamics in traditional models (e.g., ARIMA or statistical polling averages), which only focus on snapshots of polling data, neglecting the evolving nature of opinion over time [18]. Second, the inconsistent use of poll timing or poll frequency across different time periods hinders identifying the weight of more recent polls and better predicting last-minute opinion swings [19]. Third, insufficient exploration of opinion decay influence from older polls and the momentum of recent trends are often underexplored or oversimplified in forecasting methodologies [18]. Fourth, the lack of real-time forecasting models that adjust dynamically as new polls or events emerge [20]. Fifth, the role of external events (e.g., crises, scandals) in altering temporal opinion trends is not consistently captured [21]. The implications of addressing the enumerated research gaps are numerous, including overall improved forecasting accuracy, the identification of critical time windows, better understanding of voter behavior, early detection of trends and momentum, and support for real-time decision-making.
Related studies have explored how the extensive media coverage of opinion polls might influence voters before elections [22]. Social networks are pivotal in the dissemination of information, effectively shaping large-scale events [8,23], as seen in the Arab Spring of 2010 [24,25] and the US presidential elections of 2008 [26] and 2012 [27]. These platforms enable a widespread distribution of information that forms a contemporary social dimension [10]. Understanding this dynamic layer enhances predictive insights into the real-world social networks they replicate, with applications in marketing, public relations [3,28], epidemic spreading [29,30], hurricane forecasting [31], and predicting box-office earnings [32].
In the more recent literature, we find works, such as Leiter et al. [33], who argue that individual voter predictions often have higher accuracy than voter intention polls, mainly based on social network activity. The study considers direct measures of social networks and finds that network size, political composition, and frequency of political discussion are the most significant network parameters predicting the accuracy of citizens’ election forecasts. Furthermore, Jennings et al. [34] provide a data-driven estimation of how much “lead” in the polls allows for the prediction of the election outcome, and offers estimates of the optimal duration to be leading in the polls. Liu et al. [21] propose a more complex integration of Twitter data into political forecasting than before. That is, tweet sentiment is used to replace polling data, and classical forecasting models are transformed from the national level to the county level (i.e., the finest spatial level of voting counts in the US). The work published by Kennedy et al. [35] relies on Bayesian additive regression trees (BARTs) to analyze election datasets. Their model is able to predict 80-90% of direct elections from global data. In addition, their most important conclusions are that economic indicators are weak predictors of elections, while global polling is a robust predictor of outcomes. Also, the authors predict that quantitative electoral forecasts are going to remain the most viable forecasting solution in the near future. Finally, we note the work of Wang et al. [36], which proposes that non-representative polls may be used instead of representative polls, with proper statistical adjustments. The former have the advantage of being faster to gather and at a lesser expense than traditional survey methods. The results from the Xbox gaming platform, adjusted with MRP (multilevel regression and poststratification), were able to provide forecasts as accurate as those of the leading pollsters.
Network science aims to grasp diffusion processes by structuring interactions at the micro scale (i.e., among individuals) and then predicting opinion shifts at the macro scale [37]. The macroscopic outcomes can be deduced by: (i) identifying when individuals are indoctrinated by their surroundings (i.e., adopting information, contracting infections, purchasing of goods [1,3,38,39]), and (ii) forecasting the dissemination of information cascades and their individual-driven diffusion. However, the timing factors are critical in influencing diffusion dynamics [9,10], as well as in the timing of releasing opinion polls [22] or organizing elections [5].
Indeed, temporal factors play a critical role in election prediction due to the dynamic nature of public opinion and external events. For instance, during the 2016 U.S. presidential race, an impactful announcement regarding Hillary Clinton’s (Democratic candidate) emails occurred just days before the election, potentially swaying undecided voters towards the Republican candidate. Also, in the final days leading up to the 2016 UK Brexit, the assassination of an MP shocked the nation, briefly shifting public sentiment toward the “Remain” camp. During the 2017 French presidential election, some voters shifted their support in the final days toward a more viable candidate to avoid “wasting” their vote, a phenomenon known as tactical voting. Thus, voters rallied around Emmanuel Macron to prevent Marine Le Pen from winning. Similarly, the COVID-19 pandemic significantly impacted election dynamics in 2020, emphasizing healthcare and crisis management in voter decision-making. Notably, Donald Trump’s handling of COVID-19 became a key differentiator for Joe Biden, who emphasized a science-based and empathetic approach, likely helping him secure the presidency.
The motivation of our study is to examine how social macroscopic behavior, in varied electoral contexts across the world, can be inferred just from microscopic temporal dynamics during pre-election phases. Here, we rely on our innovative macro-scale temporal attenuation model (TA), previously introduced in [40], which is tested here in terms of electoral forecasting accuracy, relying only on pre-election poll data on each election, from each chosen country. We introduce two variants of the TA model (i.e., ETA and PTA, described in Section 2), which are distinct in their ability to capture the fluctuating pulse of public opinion as it evolves with the input of poll data.
The effectiveness of our forecasting model is assessed through the total positive error (TPE), the mean absolute percentage error (MAPE), and the root mean squared error (RMSE). The obtained poll predictions are compared with ARIMA models fitted to pre-election data and the best pollster predictions (BPPs) from each studied country, incorporating newer MRP-based forecasts (mainly those from the US). As such, our paper claims the following contributions:
  • We devise an analytic approach for the macro-scale modeling of multi-opinion systems to improve election poll forecasts.
  • We establish an experimental framework utilizing pre-election data to test the foundational assumptions of our method.
  • We provide a case study on ten recent elections from different countries between 2020–2024, evaluating our method’s efficiency against advanced forecasts, including ARIMA and MRP, utilized by top pollsters.
  • We investigate the real-time applicability of the two TA variants during pre-election periods and compare their performance against the best pollster predictions at various intervals leading up to the election day.
The main claim of this study is to establish and validate an electoral forecasting method that operates independently of demographic, economic, or political-context data. This differentiates our TA method favorably from methods like MRP by allowing for its application, with sufficient reliable public polls, across any global political region, without contextual knowledge. Consequently, the results presented in this paper were obtained without incorporating additional information about any of the chosen countries around the world. Given the persisting socio-political challenges, our research holds substantial societal value, particularly for industry and governments, by offering insights for an enhanced understanding of electoral system dynamics.

2. The Macro-Scale Temporal Attenuation Model

2.1. Prerequisites and General Considerations

Our temporal attenuation model (TA) is based solely on pre-election data, available for a specific election of any number N of candidates, under the form of percentages or number of votes towards each candidate. As such, the first step represents collecting the pre-election multi-opinion polls, over an arbitrary period before the election and leading up to just before the election day. Each poll is associated with a specific day (t) that represents the relative date the poll was made public. Taking into account each electoral candidate, we define the temporal opinion poll vectors ω k ( t ) , where k represents the candidate’s index in the system ( 1 k N ), and t denotes the poll’s relative date. We describe, as a macro-scale opinion injection, all the data from polls ω k ( t ) that occur before the election day t e , within 0 t < t e . The discrete temporal election axis t = [ 0 , t e ) is relative to the date of the first available poll, marked ω k ( t = 0 ) , and the final poll in the dataset, before t = t e .
The second step, and an integral component of TA, involves accurately replicating the real-world timing when integrating opinions, specifically at the daily level in this study. Thus, rather than compressing consecutive polls (i.e., ignoring their dates and the number of days in between), we distribute them over time to simulate fluctuations and periods of stability that reflect overall public sentiment. The poll vector ω , which corresponds to all candidates 1 k N at time t, is described as follows:
ω ( t ) = { ω 1 * ( t ) ω k * ( t ) , , ω N * ( t ) ω k * ( t ) } , if     poll   on   day   t { 0 , , 0 } , otherwise
Equation (1) indicates that when a poll is present for a given day, we compute a normalized opinion value from the raw poll data ω * (e.g., divide the number of votes by the total number of votes for all candidates). Normalization is important because there may be polls expressed as the number of voters toward a candidate, where the absolute value of voters varies from hundreds to tens of thousands.
Given all the opinion poll vectors ω ( t ) , we define the set Ω that reflects the daily opinion towards each candidate. If certain days lack opinion polls, we address this by assigning a 0 (no vote) to each candidate for those days 0 < t < t e . Thus, the set Ω consists of continuous (daily) opinion poll vectors 0 t < t e :
Ω = { ω ( t = 0 ) , ω ( t = 1 ) , , ω ( t < t e ) }
The situation when there is no public poll on a given day t is quite common, and these days are equally important to the model, as public opinion “settles” during periods of relaxation.

2.2. From Micro-Scale Interactions to Macro-Scale Behavior

Current micro-scale models for opinion injection typically utilize static thresholds or thresholds that change according to basic probabilistic methods [41,42,43], without taking into account temporal dynamics [44]. However, in related epidemiological research, we find well-known parametric models for disease transmission likelihood, incorporating time as a factor [45,46]: exponential and power law. In their continuous epidemiological formulation [46], these functions capture the time-dependent transmission likelihood λ i ( t ) of the period Δ t since an individual i became infected.
We derive motivation from these functions, as they represent the temporal reduction in infectiousness post exposure and can similarly track opinion dynamics following each opinion injection, such as an opinion poll. Thus, we propose the exponential (ETA) and power-law (PTA) temporal attenuated models, and examine their effectiveness in reproducing the natural evolution of individuals’ opinion dynamics in the context of electoral opinion. The two epidemiological transmission likelihood functions are defined, for any individual i, by the expressions in Equation (3) using arbitrary parameters α and β .
λ i ( t ) = α i · e Δ t β i ( exponential ) λ i ( t ) = α i · Δ t β i ( power - law )
We further define the two TA models with parameters μ and δ . The parameter μ sets the magnitude of the bounce when an opinion is injected under the form of a poll, while δ regulates the damping rate to equilibrium ( λ ( t ) = 0 ) for any candidate (refer to Figure 1).
We note that the focus of this study excludes the implementation or adaptation of epidemic models such as SIR/SEIR to describe opinion dynamics. The diffusion mechanisms (infection versus opinion) have several fundamental differences: (i) opinion models lack clearly distinguishable infectious states (like E, I, R), (ii) opinionated individuals do not recover after being exposed to an opinion, like in the case of disease, and (iii) diffusion is more accurately represented by independent cascades and linear thresholds instead of infection/recovery rates [42,47].
Using the aforementioned temporal micro-scale models (Equation (3)), we define the notion of opinion pulse Ψ k ( t ) ; this represents the macro-scale aggregate estimation for a candidate k opinion trajectory at time t. To compute Ψ k ( t ) we first define the weight w k ( t ) of opinion, towards a candidate k, in order to capture the following dynamic of opinion: during a relaxation phase, i.e., if no opinion is being introduced at time t (meaning ω k ( t ) = 0 ), w k ( t ) stays the same as in the previous iteration ( t 1 ). Therefore, as time increases discretely ( t 1 t ), the opinion pulse Ψ k ( t ) diminishes. Conversely, when opinion is introduced, injected through a poll with ω k ( t ) > 0 , w k ( t ) is increased by an amount proportional to the ratio of normalized votes ω k ( t ) . The temporal progression of w k ( t ) is described by the following equation:
w k ( t ) = w k ( t 1 ) · t δ + μ · ω k ( t ) , if ω k ( t ) > 0 ( poll exists at t ) w k ( t 1 ) , if ω k ( t ) = 0 ( no poll exists at t )
where the magnitude and damping rates of the weight depend on the two model parameters μ and δ . We further hypothesize Ψ k ( t ) for both ETA and PTA in Equation (5) based on the opinion weight w k ( t ) .
Ψ k ( t ) = w k ( t ) · e t δ ,   f o r E T A w k ( t ) · t δ ,   f o r P T A
Finally, through the temporal evolution of each opinion pulse, the general opinion Θ k towards a candidate k can be inferred by normalizing the pulse of each candidate, at any moment in time 0 t < t e , as follows:
Θ k ( t ) = Ψ k ( t ) / i = 1 N Ψ i ( t )
To better understand the principle behind the opinion pulse, we create an analogy with the ocean waves. Each candidate has an opinion pulse, synchronized with all other pulses, like the way waves move in a synchronized manner. When an opinion poll is introduced in the model, at time t, each wave rises; when no opinion is introduced, say at t + 1 , all waves start falling. The difference between each candidate’s opinion pulse is the magnitude of the wave, which translates to their likelihood of being voted.
The evolutionary dynamics of the opinion pulses Ψ k ( t ) and the general opinion Θ k ( t ) are illustrated in Figure 1 using a proof-of-concept voting system. For more in-depth details on how TA compares to other forecasting methods, refer to Appendix A.

2.3. Computational Framework Based on Temporal Attenuation

We further consolidate the defined terms to illustrate the application of ETA and PTA methods to pre-election data using a developed computational framework. Figure 2 illustrates the step-wise application of our prediction methods, from curating the pre-election poll set Ω to computing the opinion pulse and general opinion for each candidate.
The input of our TA methods is a dataset comprising pre-election polls, where every poll conveys opinions regarding each candidate k within a multi-opinion system, with N candidates. In general, available poll data are given under the form of rows, where each row corresponds to a unique pre-election poll, given in descending order of the date (from newest to oldest). A row may contain information about the pollster who conducted the field research, the date or period of the poll, the total number of respondents, and most importantly, the number of votes or percentages towards the main candidates.
In the data-parsing phase, shown in Figure 2A, the raw poll data are prepared in a simplified text format containing only the date of a poll and the percentages for each candidate. Here, we ensure that all dates are converted into the same format (e.g., MM-DD-YY). For this first step, often minor manual corrections are needed, and then the data are fed to a parser written specifically for each dataset. The output of the parser is a list of poll data for each date, in descending order of the date. In the data-ordering phase, shown in Figure 2B, the set Ω is created, comprising daily poll vectors ω k ( t ) for each day 0 t < t e . In this sense, the poll data are first inverted, to start from the oldest poll, then zeros (0%) are added for all candidates on the days when no poll was published. The temporal attenuation phase, illustrated in Figure 2C, consists of feeding the obtained set Ω to our computational framework, where the opinion pulse Ψ k ( t ) is derived from Ω using weights w k ( t ) , magnitude μ , damping factor δ , and daily poll vectors ω k ( t ) . The outcome, detailed as daily opinion evolution Θ k ( t ) , is calculated from each pulse Ψ k ( t ) . All of the underlying code for this study is proprietary and was written by the authors in Java. A further and more detailed numerical example of processing the input data using our TA method is given in Appendix B.
Thus, we summarize the steps behind how TA is applied, based on Figure 2, as follows:
  • The input for our TA methods is a dataset of pre-election polls, where each poll provides the percentage of support for each candidate in a multi-candidate system.
  • All poll data are organized into rows, and each row represents a single poll.
  • Data parsing: poll data are simplified to include only poll dates and candidate percentages.
  • Data ordering: the dataset is reorganized into daily poll vectors, starting from the oldest (first) poll. Missing poll days are filled with zeros (0%) for all candidates.
  • Temporal attenuation: the ordered data are fed into our computational framework to calculate the opinion pulse Ψ k ( t ) for each candidate k.
  • The opinion pulse is used to derive the daily opinion evolution Θ k ( t ) toward each candidate.

3. Materials and Methods

Prediction models that incorporate microscopic interactions frequently depict data as information cascades initiated by opinion sources (also referred to as spreader nodes, stubborn agents, or vital nodes) [41,42,43]. Although these interactions hold significance, it remains technically, ethically, and legally impractical to account for them all (for instance, scrutinizing every tweet from all users in a country before an election). Therefore, we utilize widespread data, specifically pre-election opinion polls, which are accessible and aggregated from various open access sources, as detailed in Appendix C.
In our previous study [40], where we first introduced the concept of temporal attenuation (TA), we presented a case study examining US presidential elections from 1968 to 2016, in order to highlight the effectiveness of the TA method compared to several statistical methods for electoral prediction. As such, our previous study tailored TA specifically to the US electoral system: to predict the popular vote for the two candidates from the main political parties.
In terms of the novelty of this study, the idea of a charging and discharging capacitor, in relation to the opinion polls being made public, was introduced in [40] as the “momentum” of opinion. The momentum was influenced directly by the raw number of votes received by each candidate from a poll. In contrast, in our current study, we added the opinion weight w k ( t ) which acts like the previous momentum, but uses the normalized number of votes. We noticed that normalization increases the applicability of the model in multi-party systems. Next, the newly added concept of opinion pulse is based on the opinion weight, which implements the actual exponential or power-law-based attenuation.
Secondly, the current study demonstrates the flexibility of the improved mathematical model, namely by using TA on more heterogeneous datasets (in terms of number of polls and pre-election period duration) and in several multi-party systems (any number of candidates), using 10 datasets from nine different countries around the world, each with significant socio-political differences. We believe that both the mathematical and practical improvements are valuable for the community, since they support the wider applicability of our TA model.
We further compare poll estimates obtained through our ETA and PTA methods against traditional statistical approaches such as ARIMA modeling (AR), survey averaging (SA), cumulative vote counting (CC), and the best pollster predictions (BPPs) of the time. The actual poll results from each election serve as the ground truth for validating forecasts.

3.1. Election Datasets

We aggregate electoral pre-election poll data from the following countries (in alphabetical order): Argentina (2023), Brazil (2022), Canada (2021), France (2022), Indonesia (2024), Poland (2020), Romania (2024), Turkey (2023), and the USA (2020, 2024). We note that the respective Canadian parliamentary elections were for the election of the Prime Minister, while all other datasets correspond to Presidential Elections. In all cases, we aim to predict the popular vote of the first round (to which the pre-election polls refer). For the US, data were aggregated from the Real Clear Politics website. For all other countries, we found the Wikipedia pages dedicated to the presidential elections as the most complete and reliable sources for pre-election data. Table 1 provides information on all 10 datasets, including the election date, the number of pre-election (PE) polls available, the PE period, and the final results for each candidate. We chose varied electoral systems, with 2–6 (main) candidates, to showcase that TA works well in all situations. The percentages displayed in Table 1 are used as the ground truth to measure the performance of our TA method. Additional information on the input data, including the names of all candidates, is found in Appendix D.
The inclusion of these datasests aims to ensure relative recency of the elections, a broad coverage of different electoral systems, and obtaining a high variability in terms of the dataset sparseness. As such, a close look at Table 1 shows that we are using electoral datasets with as few as 52 pre-election polls (Indonesia), and as many as 366 (Canada); the pre-election period ranges from 1.5 months (Poland) to 21 months (USA); the number of candidates varies from 2 to 6 (where necessary, we excluded all other candidates above the top six). In terms of geographic diversity, we managed to collect data from four continents. Unfortunately, we did not find sufficient data to include any African countries. Finally, we also intended to provide diversity in terms of the democratic robustness of the electoral system [48].

3.2. Statistical Methods for Poll Estimation

To quantify the electoral forecasting accuracy of our two TA methods (ETA and PTA), we present statistical benchmarks that allow for comparison between traditional models and the proposed TA models. These benchmarks are categorized into two groups: statistical estimates, such as cumulative counting (CC) and survey averaging (SA), and advanced statistical models, including ARIMA (AR) and best pollster predictors (BPPs).
In cumulative vote counting (CC), the total votes for each electoral candidate k are obtained by aggregating votes from polls ω k * ( t ) over the entire pre-election period [ 0 , t e ) . Importantly, CC utilizes the absolute vote count per poll, i.e., ω k * ( t ) , rather than the normalized count ω k ( t ) . Thus, we define a similar cumulative opinion pulse c Ψ k ( t ) as follows:
c Ψ k ( t ) = c Ψ k ( t 1 ) + ω k * ( t ) , if ω k ( t ) > 0 c Ψ k ( t 1 ) , if ω k ( t ) = 0
When the polling period ends, i.e., t = t e , each cumulative opinion pulse c Ψ k ( t ) retains the aggregate votes for each of the N candidates. The current opinion towards a candidate k, c Θ k ( t ) , can be determined at any moment by normalizing the respective opinion pulses:
c Θ k ( t ) = c Ψ k ( t ) / i = 1 N c Ψ i ( t )
Survey averaging (SA) involves calculating the mean of the normalized poll results throughout the pre-election timeframe. To compute the general opinion s Θ k ( t ) after t days have passed since the beginning of the pre-election period, the normalized poll vector ω k ( t ) is utilized directly:
s Θ k ( t ) = 0 i t ω k ( i ) | { ω k ( i ) | 0 i t } |
Equation (9) expresses the general opinion at any time t ( 0 t < t e ) by totaling all normalized votes from [ 0 , t ] and dividing by the number of existing polls (cardinal) in that interval. For the whole pre-election period, when t = t e , the general opinion towards a candidate k, based on survey averaging, becomes s Θ = 0 t < t e ω k ( t ) / t e . The key difference between SA and CC lies in the reliance of CC on the absolute vote count for candidates, while SA utilizes normalized polling data.
ARIMA (AR) models rely solely on the time-series data of a dependent variable, ignoring any independent variables. They utilize current and historical data points of the variable to produce dependable short-term forecasts. ARIMA models have been applied effectively to the UK’s Brexit forecast by analyzing British Pound fluctuations [15], the UK General Election through Twitter data [14], and, most commonly, in predicting stock-market trends [49].
Often, the usage of AR starts though a grid search for hyper-parameter optimization to identify optimal combinations of (p, d, q), where p is the autoregressive order, d is for transforming nonstationary data into stationary, and q is the moving-average order. Initial experiments pinpoint to an ARIMA (5,1,0) model for the election datasets. However, we decided to use the auto.arima method available in the R package forecast, which uses a variation of the Hyndman–Khandakar algorithm [50], that combines multiple automatic tests. As a benchmark, ARIMA estimations are made on election day t e , specifically 1–9 days after the final pre-election poll, when the last poll was published in each of our ten datasets.
Finally, we employ the best pollster predictions (BPPs) from pollsters for each election cycle. The top pollsters in each election dataset, along with their candidate projections, are shown in Appendix D. Over the last decade, pollsters have incorporated optimal statistical and computational techniques, such as MRP, to improve prediction accuracy. Including BPPs in our comparative analysis serves as a significant reference point, since these pollsters employ the state of the art in current electoral forecasting; thus, their predictions are the best real-world benchmark.

3.3. Forecasting Performance Metrics

In each forecast based on TA with the parameters μ and δ , we quantify the total positive error TPE( μ , δ ) as an absolute point deviation from the actual election results. Using the ground-truth results, denoted as Θ ¯ c a n d i d a t e d (dataset d symbols are defined in Table 1), the total positive error TPE is defined as the cumulative positive differences in estimation errors for all N candidates:
T P E d ( μ , δ ) = 1 k N | Θ ¯ k d Θ k d ( μ , δ ) |
The metric TPE provides a gauge of performance and serves as an intuitive performance indicator of TA compared to the state-of-the-art forecasting methods. However, due to the underlying simplicity of TPE, we supplement all of our model’s performance evaluations with the mean absolute percentage error (MAPE) and the root mean squared error (RMSE) [51].
MAPE and RMSE serve to evaluate our TA method’s forecasting accuracy against ground-truth results for each electoral candidate. The formal definition of MAPE is:
M A P E d = 100 · 1 N · 1 k N Θ ¯ k d Θ k d Θ ¯ k d
MAPE denotes a percentage deviation, taking into account the sign between the actual outcomes ( Θ ¯ k d , in dataset d for candidate k) and the forecasted outcomes ( Θ k d ).
RMSE denotes an unsigned absolute difference between actual results and predicted outcomes. In Equations (11) and (12), N represents the number of candidates assessed. An average MAPE and RMSE can be calculated across all election years.
R M S E d = 1 N 1 k N ( Θ ¯ k d Θ k d ) 2
Moreover, in our comparative analysis, we utilize the statistical accuracy (Acc) metric. Accuracy is the ratio of correctly classified instances, including both true positives and true negatives, to the total instances. We specifically measure the successful prediction of the election winner from each of the 10 election datasets based on the estimated popular vote.
Finally, we note that we did not take into account any dataset or other geo-political characteristics when choosing the parameters of the TA model, because our intention is to provide a model that can tackle, to the best possible extent, any electoral dataset, without significant restrictions or drawbacks. Thus, we used fixed values for the model parameters in all experiments, namely magnitude μ = 1 for both ETA and PTA, and damping δ = 1 for ETA and δ = 3 for PTA. We repeated measurements for increasing values of μ and δ (in the interval 0.05 –5) and chose the single combination of μ and δ which determined the lowest average TPE over all ten datasets. Thus, our results summarize the average performance of TA based on the best trade-off combination of μ and δ . Individually, we can find better combinations of these parameters for each dataset, but not on average.

4. Results

An effective electoral forecasting approach must meet two criteria: (i) it must yield poll estimates for all candidates that closely match actual voting outcomes, and (ii) it must reliably predict the election winner. We evaluate the first stated property using the total positive error TPE, MAPE, and RMSE (refer to Equations (10)–(12)). A reduced TPE indicates better forecasting, while a MAPE and RMSE closer to 0 translate into greater forecast accuracy.
In Table 2, the TPE metric is shown, while Table 3 provides a summary of MAPE and RMSE between each forecasting technique and the ground-truth election outcomes. On average, across all datasets, the BPP exhibits a total positive error TPE of 7.65 points, while SA and CC show much higher errors of 17.36 and 16.09 points, respectively. AR is situated ‘in-between’ with a TPE of 9.81 points. Our methods present the smallest TPEs, with only 6.95 points for ETA and 6.92 points for PTA. Furthermore, the mean MAPE values are 11.17 for ETA, 11.03 for PTA, 12.91 for BPP (i.e., 17% higher than PTA), 15.48 for AR, 42.98 for SA, and 47.73 for CC. Regarding RMSE, ETA registers 1.89 points of TPE, PTA—1.88, BPP—2.12 (i.e., 13% higher than PTA), AR—2.93, SA–4.76, and CC—4.23.
Individual and mean results in Table 2 and Table 3 indicate that our TA methods outperform all other state-of-the-art forecasting methods. Compared to the BPP estimates, we measure average improvements in TPE of 0.70 points for ETA and 0.73 for PTA. Moreover, ETA and PTA show substantial improvements of 2.86 and 2.89 points, respectively, over AR estimations in terms of TPE.
Several statistical estimates are used to quantify the second stated property (ii). Initially, we calculate the lead, defined as the point difference between the winner and the runner-up of each election. For reference, the electoral percentages of all the candidates can be found in Table 1. Table 4 outlines the differences between the leads estimated by each forecasting measure as the relative offsets from the actual lead (given in column Lead). For example, in dataset ARG’23, the measured lead is 6.70 points, resulting from subtracting the point difference C1–C3 found in Table 1; similarly, the ETA offset equals −5.58 points, indicating that ETA predicted a lead difference of 6.70 5.58 = 1.12 points between the top two candidates. The average lead offsets for each statistical method, found in the last line of Table 4, highlight the superior ability to predict the election winner of our TA methods. Specifically, ETA and PTA, again, exceed all other forecasting methods, with an average lead offset of −1.57, that is 60% lower (better) than the average lead offset of the BPP, respectively, 63% lower than the ARIMA average offset.
We further examine the extent to which the correct winning presidential candidate was identified in the ten election datasets. To assess prediction efficacy, we apply the statistical accuracy (Ac), as defined in Section 3.3. An analysis of the predictions of each statistical method in the ten datasets shows that the reference statistical estimates (SA, CC) are less effective, with SA correctly anticipating only 5 out of 10 presidential outcomes (Ac = 0.5), and CC successfully predicting 6 out of 10 (Ac = 0.6). Moreover, the BPP predicted 7 out of 10 (Ac = 0.7) winners, AR predicted 6 out of 10 (Ac = 0.6), whereas both ETA and PTA predict 9 out of 10 winners (Ac = 0.9). Notably, no method predicted the correct results in the ROM’24 dataset, due to the local pollsters’ large margin of error.
Additionally, Figure 3 emphasizes the strong forecasting capability of the TA methods when compared to competing methodologies. It is important to note that our TA techniques surpass BPP in terms of total positive error TPE in 6 out of 10 datasets (see Figure 3A). Here, all vertical bars represent the offset between the BPP TPE and the TPE of the compared forecasting method; thus, positive values represent higher performance compared to the BPP. Regarding MAPE, ETA and PTA achieve better performance than BPP in 7 out of the 10 election datasets (see Figure 3B). In terms of RMSE, the TA methods exceed the BPP performance in 7 out of 10 datasets. In general, both our TA methods demonstrate superior performance over the AR benchmark on 9/10 datasets.
Across all metrics tested, our TA methods clearly demonstrate superior forecasting accuracy in 6 out of 10 election datasets, and out of the 10 × 4 (datasets × other competing metrics) comparison scenarios, our TA methods are superior in 36/40 (90%) of comparisons. Figure 3A,B visually depict the offset between the BPP prediction errors (TPE and MAPE) towards our TA methods’ prediction errors (i.e., T P E B P P T P E T A ). Thus, columns with values exceeding 0 indicate that our TA methods have better performance. Figure 3C shows a heat map indicating whether our ETA or PTA methods exceed the BPP benchmark (highlighted in green, otherwise in orange), for each electoral dataset. We note that the lowest performance of our TA methods is achieved on the CND’21 dataset, with a relative TPE offset of −1.67–−1.48 compared to the BPP, whereas the highest performance is achieved on the ROM’24 dataset with a TPE offset of 4.55–4.59. The mean TPE offsets, averaged across all ten datasets, are 0.70 for ETA and 0.73 for PTA. Finally, Figure 3D presents a heat map that highlights the correct prediction of the election winner (green cells) for each forecasting method in each of the ten datasets. For example, one can quickly assess that ETA missed only 1 in 10 winner predictions, while AR missed 4 in 10 predictions. Compared to the BPP, our TA methods manage a superior 90% winner prediction accuracy.

Real-Time Forecasting Performance Analysis

Next, our analysis is focused on assessing the applicability of our TA models during an ongoing pre-election timeframe. Thus, we employ the USA’20 and USA’24 datasets to evaluate prediction errors TPE, MAPE, and RMSE at various time points before the election date t e . The USA’20 dataset includes 280 polls conducted over 651 days leading up to the elections, and the USA’24 dataset includes 143 polls conducted over 264 days. We opt to evaluate predictions and their respective performances at t = { t e 50 , t e 100 , t e 150 , t e 200 } . Table 5 shows the total positive errors TPE across all six forecasting methods, with Real Clear Politics being used as the BPP for both US elections. The absolute point differences Δ , between PTA and BPP, AR, and CC are given in the last three columns of Table 5. All differences Δ < 0 indicate the superior performance of PTA over the other forecasting method. Additional detailed experimental findings, including MAPE and RMSE measurements, are available in Appendix E.
Our analysis indicates that the forecast precision of statistical techniques (AR, SA, CC) is primarily influenced by the data volume, as their predictions gradually converge with the real electoral outcomes. In contrast, the more advanced forecasting methods rely not on accumulating data—especially as the elections near—but on the unpredictability of the political and social landscapes. For instance, BPP showcases the highest variability, with fluctuations from a low TPE = 5.40 in June 2024 to a significantly higher TPE = 7.40 by the end of July 2024, before dropping back to TPE = 5.60 in October 2024. Our TA methodologies exhibit greater temporal stability compared to BPP, showing resilience against the social fluctuations observed in BPP. In USA’20, we note that candidate C1 (Democratic) reached their lowest popularity in April–May 2020, while candidate C2 (Republican) gained proportional popularity; this volatility (around t e 150 ) is reflected by the TA methods in assigning C2 an increased theoretical win probability. However, as the polls balanced out again mid-summer 2020, our predictions improved, aligning more closely with a balanced result akin to the actual popular vote of 2020. Moreover, in USA’24, there was increased volatility, as the Democratic candidate was changed mid-summer. Here, even though BPP is characterized by a higher TPE 5.40 9.40 during the whole pre-electoral period, our ETA and PTA methods maintain lower values of TPE 1.60 2.30 . When looking at the TPE point differences Δ , we notice that PTA is superior to the other competing methods most of the time before the election date.
Finally, in Figure 4 we provide a visual representation of the evolution of the total positive error TPE during the pre-election period, up to 200 days before the election day t e . We mark several points in time with diamond shapes ( t e 50 t e 200 ), where the color of each shape reflects the forecasting method with the lowest TPE at that moment. In addition, the colored timeline at the bottom of each panel highlights the best predicting method at each moment in time.
We may easily conclude that PTA (violet) is, on average, the best forecasting method during the USA’20 pre-election period, while ETA (red) is, overall, the most performant method in the USA’24 pre-election period. In both Figure 4A,B BPP displays the highest volatility in time and only manages to obtain a low TPE very close to the election date. Even though ETA and PTA are using the same pre-election poll data, they manage to maintain lower volatility and, overall, a much lower TPE in time. The Arima AR method is less volatile than BPP; however, its averagely high TPE makes it a less-performant forecasting method.
Overall, our findings indicate that, in contrast to leading pollsters who depend on MRP validated by social, economic, and political trends, our ETA and PTA methods enhance their forecasting performance by focusing exclusively on the temporal convergence of public opinion, demonstrating a significant level of estimation precision [7,52,53].

5. Discussion and Conclusions

Our research distinguishes itself from prior electoral forecasting studies in multiple ways. Unlike straightforward statistical estimates such as cumulative counting (CC) and survey averaging (SA), our macro-scale temporal attenuation (TA) approach requires supplemental time-series data in the form of pre-election surveys specifying the date when each poll was released. In contrast to simply averaging the data (like SA), we input poll data into our computational framework, which is shaped by the temporal aspects of the opinion polls. Our two TA methods—namely ETA and PTA—both conceptualize opinion dynamics over time, portraying it as a function that rises when opinion poll data are introduced in the system and diminishes in its absence. We introduce an analogy with the “charging process of a capacitor” (i.e., injection of opinion) and its gradual discharge (i.e., a state of relaxation without new input). A capacitor charges along an exponential curve, much like how public opinion slowly shifts toward a particular viewpoint as more information is absorbed. Initially, it takes longer to influence the population, but as more people are exposed more often to the information, the consensus starts to form more rapidly. Similarly, the discharging rate can be influenced by external factors—the capacitor discharges based on the resistance in the circuit, and public opinion shifts in response to new information, events, or influential figures. The change is not instantaneous, but follows a similar exponential decay whose speed depends on the damping factor of our model.
In contrast to leading methods such as multilevel regression and post-stratification (MRP) [11,12,13], recently integrated by top US pollsters, our TA does not require political, economic, or demographic context details associated with the election. TA is based on a fundamentally different process compared to other forecasting methods which rely on various socio-economic indicators. The only approach to incorporate additional indicators into the model would be to translate that data into weights for each published poll, but that would change the model significantly, and is not within the scope of this study. This characteristic provides a strategic edge, allowing TA to be employed in any global political region, provided there are sufficient, reliable public polling data. Consistent with the paper’s case study, which focuses on elections from nine different countries, we did not include any particular socio-economic information.
Our TA computational framework seeks to improve the predictability of the popular vote. Although we find specialized studies that address unique electoral systems, such as the US college-based system [12,13] and systems employing a direct popular vote, such as in France [4], in contrast, the TA forecasting model has been developed for broader applications beyond specific political contexts, provided there are adequate and trustworthy pre-election polling data. Although this may seem to disadvantage the TA model within systems such as the US, TA still exhibits superior performance in practice. Furthermore, unlike other models that may require adaptation to different countries, the TA model operates efficiently without such modifications.
Being context-independent makes TA particularly robust in volatile political environments, where traditional forecasting models might struggle due to incomplete socio-economic data. Even in politically unstable regions, where external influences may disrupt voter sentiment, TA remains applicable as long as polling data are available and reflects evolving public opinion trends. Nevertheless, in regions where media access is heterogeneous, the dissemination of poll results may be asymmetric, leading to delayed or distorted opinion shifts among certain voter segments. Since TA does not directly incorporate media influence, its forecasts will reflect the polls themselves, meaning that any media-induced biases in polling data will be reflected in the predictions. However, the TA model’s reliance on temporal patterns rather than media-driven demographic adjustments makes it less vulnerable to the amplification of biases seen in models that require political or socio-economic inputs.
This research begins with the assumption that social networks amplify opinions derived from established public opinion polls because of their substantial media exposure. Recent studies on how adults inform themselves about political candidates and issues indicate that TV news leads with 73%, followed by news websites/apps at 45%, newspapers at 24%, and social media at 21%. These figures support our assumption, as poll-based opinions are disseminated across all these media channels [54]. Moreover, despite the diversity of current media types, their collective reach is substantial, including during electoral periods, ensuring the reliability of polling accuracy [55].
Alongside the comparison between TA and the best pollster predictions (BPPs) in each electoral system (see Appendix D for more information), an ARIMA (AR) benchmark was also employed, calibrated using the same pre-election datasets. On average, AR yields reliable forecasts, surpassing those of CC and SA, yet falling short when compared to BPP and the two TA methodologies proposed by us. A comparison benchmark was further utilized for the real-time feasibility analysis in two electoral systems (USA’20 and USA’24), where once again, our TA methods emerged as superior electoral predictors, displaying reduced sensitivity to the social fluctuations revealed by pre-election surveys.
Our current study also presents certain limitations that we further examine in detail. First, we view social networks and social media as a widespread means of information dissemination, yet there exist individuals who are so-called ‘non-users’ [56,57]. We incorporated the simplification of considering published opinion polls to reach all voters in a system mainly due to challenges related to obtaining data on offline users and the trustworthiness of such data. According to multiple official statistics, approximately 67–75% of the global population interacts with social media. Despite this, we maintain that our model’s simplification is valid, as research on political attitudes reveals no statistically significant differences between social media users and non-users concerning political awareness, values, or behavior [56].
Next, an additional simplification states that the modeled electoral system is relatively resistant to external manipulation, thus eliminating the need to account for data that are beyond our control (i.e., external influences). Based on the ‘liberal democracy index’, which was designed to assess the resilience of political systems (on a scale of 0 to 1), we measure values between 0.11 (Turkey) and 0.81 (France), with an average of 0.59, according to research conducted by the Swedish V-Dem institute [48]. Therefore, we can consider electoral systems, such as the Argentinian, Brazilian, Canadian, French, and US to be robust in terms of external manipulation; the Indonesian, Polish, and Romanian systems are borderline, while the Turkish is considered less robust in terms of opinion manipulation. Nevertheless, we wanted to showcase how TA performs in various systems around the world and simply consider possible opinion manipulation to be reflected in the dynamics of the pre-election polls.
It is also worth noting that the presented results summarize the average performance of TA based on the best trade-off combination of the model parameters ( μ and δ ). Given the globally chosen model parameters, the results on some datasets will, naturally, be weaker than the competing pollsters.
Also, another limitation of our TA model is related to data dependency, namely the strong reliance on pre-election poll data. The accuracy of TA predictions depends—like that of most forecasting models—on the availability and quality of these polls, such that if polling data is sparse, or inconsistent, the model’s performance may degrade. Regional bias could prove to be a limitation in forecasting. For example, polling accuracy is considered to vary widely between established democracies and systems with weaker electoral transparency. Our model does not directly integrate with social media sentiment analysis, which could help the TA model capture opinion shifts triggered by viral events directly, rather than indirectly though shifting opinion polls.
Finally, in terms of handling unreliable or manipulated data, we consider that the results of any data-dependent model are only as good as the data they rely on. We used publicly available data that are considered to originate from reputable pollsters in each country. In other words, we did not have access to data that are knowingly manipulated. An interesting scenario is that of the recent Romanian presidential elections (first round), where no pollster was able to predict or capture the fact that a relatively unknown and unaccounted for candidate would win the first round of elections, surpassing all big parties. Thus, the registered errors for all pollsters and our TA are very high compared to the other datasets (see ROM’24 in Table 2 and Table 3). This situation proves that with low-quality opinion polls, forecasting methods will also provide weak results.
Future analyses should also prioritize examining current trends in voter polarization and the reliability of polls [58]. However, our model for predicting election results is intentionally structured to remain largely independent of various societal and political contexts, such as the influence of polarized opinions.

Conclusions

With the surge in data availability and computational capabilities, contemporary election forecasting models are likely to develop in one of two directions: either micro-level systems utilizing detailed social media data, or macro-level systems using advanced data-science methods to analyze demographic and economic indicators. Nevertheless, our proposed methodology strikes a balance between these micro and macro perspectives, resulting in a straightforward, intuitive, and robust approach that can be applied to any pre-election dataset with temporal attributes. We consider this streamlined approach to be effective, because social influence frequently aligns with the concept of crowd wisdom [52]. In essence, the collective evaluation of a large group (macro level) can surpass the precision of individual experts’ assessments (micro level) [53]. This phenomenon becomes more pronounced when larger populations are considered [52].
Although the TA methodology might seem simplified due to its reliance on a micro-scale opinion interaction model to predict macroscopic outcomes, our findings suggest that temporal awareness plays a more crucial role in election prediction than previously acknowledged. In our analysis, TA surpasses advanced election-forecasting methods in 6 of the 10 electoral datasets studied. TA achieves an average forecasting error of 6.92–6.95 points, while statistical methods yield errors ranging from 16.09–17.36 points, AR results in an average error of 9.81 points, and the BPP error lies at 7.65 points. This signifies an approximate 10–41% enhancement in the accuracy of popular vote predictions using our approach, compared to the BPP and, respectively, AR.
Examining the techniques employed by US pollsters, such as Real Clear Politics, FiveThirtyEight, and Understanding America Study, we have yet to identify any method based on temporal attenuation comparable with our methodology. Traditional statistics-based and data-science methodologies typically depend on particular social, economic, and political contexts for refining their forecasts. In contrast, our TA method functions independently of socio-economic contextual data, which we argue is a beneficial attribute. While achieving a flawless forecasting system is unlikely due to the complexities inherent to elections, our TA offers a unique and scientifically distinguishable alternative with demonstrated high efficacy.
Beyond electoral forecasting, the TA framework offers potential for broader applications. Its ability to model temporal dynamics and opinion evolution can extend to predicting outcomes in other domains where time-sensitive data play a critical role. Applications include natural disasters, by forecasting response patterns during emergencies, such as hurricanes or pandemics; epidemiology, by modeling the dissemination of health-related information or disease outbreaks; economic forecasting, by predicting market sentiment and economic trends by analyzing the temporal impact of external events on public opinion or consumer behavior. By adapting the TA methodology to other domains, researchers may improve forecasting accuracy and decision-making across various fields where time-sensitive patterns are crucial.

Funding

This research received no external funding.

Data Availability Statement

All real-world data used in this study are summarized in Table 1. All the pre-election polls used for this study are available as raw data on the web links enumerated in Appendix C.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Comparative Analysis Between Temporal Attenuation and Competing Methods

In this section, we offer a detailed comparison between our ETA and PTA methods and the available pollster predictions (PP) and CC for the ARG’23 and TRK’23 datasets. In the Argentinian Presidential elections of 2023, the BPP were obtained by the pollster ‘Consultora Tendencias’; in the Turkish 2023 elections, the BPP were obtained by ‘BETİMAR’.
The purpose of this analysis is to demonstrate the effectiveness of our two TA methods in different electoral systems, without the need for prior knowledge on socio-economic aspects. Also, the chosen datasets consist of only 88 and 90 pre-election polls, respectively, over a pre-election period of 281 days and only 50 days, respectively; nevertheless, the TA methods manage to provide reliable forecasting results, as shown in what follows.
Figure A1 and Figure A2 examine the general opinion Θ k ( t ) over the last month (i.e., 30 days) preceding the election date. We chose to focus only on the top three (C1–C3) and top two (C1–C2) candidates, respectively, in each dataset, which were leading the polls. The predictions obtained with CC show negligible fluctuation in both datasets. In ARG’23, ETA and PTA display non-overlapping shifts in Θ k ( t ) until the last few days before the election when candidate C1 slightly surpasses C3. A similar conclusion can be drawn from the pollster predictions (PPs), where we observe higher fluctuations. In TRK’23, CC does not provide any insight of the dynamics just before the election day. On the other hand, ETA, PTA, and PP display consistent dynamics with several shifts between C1 and C2—candidate C1 surpasses C2 in the predicted polls five times over the last 30 days preceding the election date.
Figure A1. The evolution of the general opinion Θ k ( t ) towards the top three candidates (C1–C3), over the last 30 days, before the 2023 Argentinian presidential elections (ARG’23). (A) Exponential temporal attenuated (ETA) method. (B) Power-law temporal attenuated (PTA) method. (C) Cumulative counting (CC) method. (D) Pollster predictions (PP).
Figure A1. The evolution of the general opinion Θ k ( t ) towards the top three candidates (C1–C3), over the last 30 days, before the 2023 Argentinian presidential elections (ARG’23). (A) Exponential temporal attenuated (ETA) method. (B) Power-law temporal attenuated (PTA) method. (C) Cumulative counting (CC) method. (D) Pollster predictions (PP).
Mathematics 13 00604 g0a1
Figure A2. The evolution of the general opinion Θ k ( t ) towards the top two candidates (C1–C2), over the last 30 days, before the 2023 Turkish presidential elections (TRK’23). (A) Exponential temporal attenuated (ETA) method. (B) Power-law temporal attenuated (PTA) method. (C) Cumulative counting (CC) method. (D) Pollster predictions (PP).
Figure A2. The evolution of the general opinion Θ k ( t ) towards the top two candidates (C1–C2), over the last 30 days, before the 2023 Turkish presidential elections (TRK’23). (A) Exponential temporal attenuated (ETA) method. (B) Power-law temporal attenuated (PTA) method. (C) Cumulative counting (CC) method. (D) Pollster predictions (PP).
Mathematics 13 00604 g0a2
To quantify the observations depicted in Figure A1 and Figure A2, additional statistical measures are presented in Table A1 and Table A2. The CC method shows minimal dynamics, exhibiting opinion shifts of only 0.14 points for the election winner (C1) and 2.11 points for the runner-up (C3) in ARG’23, respectively, and 0.90 (C1) and 1.30 (C2) in TRK’23. The PPs display, on average, the most notable dynamism in the ARG’23 elections, varying the winner forecast by 8 points, also with the highest standard deviations of σ = 2–2.21. In TRK’23, the PP also maintains the highest dynamism, with estimation variations of Δ = 9.90–10 for the top candidates and a σ = 2.75–2.88. Our TA methods fall between CC and PP in terms of forecasting variation, with narrower ranges; notably, PTA is more stable than ETA, changing the forecast by 4.20 points (C1) and 7.16 points (C3) in ARG’23, compared to ETA changes of 6.08 points (C1) and 10.41 points (C3). In the TRK’23 dataset, the ETA and PP methods show similar forecast variability, 9.55–9.90 points for the winner (C1) and 8.94–10 points for the runner-up (C2). The PTA method is again more stable, with a variation of only 8.10 points for C1 and 5.92 points for C2, and a standard deviation of σ = 1.63–2.05.
Table A1. The minimum, maximum, difference between maximum and minimum ( Δ , and as percentage Δ ( % ) ), and standard deviation ( σ ) of the general opinion Θ for the top two candidates, during the last 30 days before the elections, in the Argentinian presidential elections ARG’23 dataset.
Table A1. The minimum, maximum, difference between maximum and minimum ( Δ , and as percentage Δ ( % ) ), and standard deviation ( σ ) of the general opinion Θ for the top two candidates, during the last 30 days before the elections, in the Argentinian presidential elections ARG’23 dataset.
ARG’23min{ Θ }max{ Θ }ΔΔ(%) σ
Candidate C1
ETA27.0333.116.0822.491.62
PTA27.9532.154.2015.031.16
CC31.3931.530.140.450.04
PP24.7032.708.0032.392.00
Candidate C3
ETA29.1239.5310.4135.752.16
PTA31.5438.707.1622.701.48
CC27.3829.492.117.710.67
PP26.5036.209.7036.602.21
Table A2. The minimum, maximum, difference between maximum and minimum ( Δ , and as percentage Δ ( % ) ), and standard deviation ( σ ) of the general opinion Θ for the top two candidates, during the last 30 days before the elections, in the Turkish Presidential elections TRK’23 dataset.
Table A2. The minimum, maximum, difference between maximum and minimum ( Δ , and as percentage Δ ( % ) ), and standard deviation ( σ ) of the general opinion Θ for the top two candidates, during the last 30 days before the elections, in the Turkish Presidential elections TRK’23 dataset.
TRK’23min{ Θ }max{ Θ }ΔΔ(%) σ
Candidate C1
ETA41.6451.199.5522.932.52
PTA42.1550.258.1019.222.05
CC45.0445.940.902.000.24
PP41.5051.409.9023.862.88
Candidate C2
ETA43.5252.468.9420.542.34
PTA45.1251.045.9213.121.63
CC46.5847.881.302.790.37
PP43.1053.1010.0023.202.75
Table A3 presents data on the days when the top two candidates (winner and runner-up) were leading or tied in the final 30 days before the ARG’23 and TRK’23 election. Counting the changes in the predicted opinion highlights the increased volatility in the Turkish forecasts. Few shifts were noted in the Argentinain election, as the runner-up (C3) candidate consistently led in the last 30 days, until just before the election day. Moreover, Figure A3 illustrates the changing lead, measured as the opinion difference between the winner and runner-up candidates. From Figure A3A, we notice that, preceding the ARG’23 election, the lead (C1–C3) was negative up to the very last day before election. This means that candidate C3 was leading consistently, and the plot captures the dynamics when C1 surpassed C3 to win the elections just days prior to the election. In Figure A3B we notice much higher volatility preceding the TRK’23 elections, as candidates C1 and C2 alternated in the lead six times over 30 days.
Table A3. The number of days predicting the real election winner as victorious (W-win), the runner-up as election winner (R-win), instances of ties between the two top candidates (Tie), and transitions in the leading candidate (Changes) over the final 30 days preceding the ARG’23 (top) and TRK’23 elections (bottom).
Table A3. The number of days predicting the real election winner as victorious (W-win), the runner-up as election winner (R-win), instances of ties between the two top candidates (Tie), and transitions in the leading candidate (Changes) over the final 30 days preceding the ARG’23 (top) and TRK’23 elections (bottom).
W-WinR-WinTieChanges
ARG’23C1C3
ETA32702
PTA22801
CC30000
PP22802
TRK’23C1C2
ETA82206
PTA82205
CC03000
PP102006
Figure A3. The evolution of the lead (opinion difference between winner and runner-up), as predicted by ETA, PTA, cumulative counting, and pollster predictions, in the final 30 days of the ARG’23 and TRK’23 presidential elections. (A) The ARG’23 dataset displaying the opinion difference between candidates C1–C3. (B) The TRK’23 dataset displaying the opinion difference between candidates C1–C2.
Figure A3. The evolution of the lead (opinion difference between winner and runner-up), as predicted by ETA, PTA, cumulative counting, and pollster predictions, in the final 30 days of the ARG’23 and TRK’23 presidential elections. (A) The ARG’23 dataset displaying the opinion difference between candidates C1–C3. (B) The TRK’23 dataset displaying the opinion difference between candidates C1–C2.
Mathematics 13 00604 g0a3
Finally, we present a summary analysis of the opinion dynamics of all combined secondary candidates. Figure A4 illustrates that the polls in the 30 days preceding the election indicate an increased voter inclination towards the secondary candidates; however, the CC and PP estimations allocate relatively lower percentages to these candidates ranked with lower chances of winning. In ARG’23 (refer to Figure A4A), CC begins with a higher estimation of the total votes for the secondary candidates (≈8 points for C4 + C5); towards the election date, the other three forecasting methods surpass CC with allocations of ≈9–13 points towards these candidates. In TRK’23, the CC allocation towards the secondary candidates (C3 + C4) remains highest throughout the last 30 days before election. On the other hand, ETA and PTA evolve very similarly in time, while PP predictions are offset by approximately 2 days behind.
Figure A4. The evolution of the cumulated opinion towards the candidates with negligible chances of winning, over the last 30 days before election. (A) The ARG’23 dataset with candidates C4 + C5. (B) The TRK’23 dataset with candidates C3 + C4.
Figure A4. The evolution of the cumulated opinion towards the candidates with negligible chances of winning, over the last 30 days before election. (A) The ARG’23 dataset with candidates C4 + C5. (B) The TRK’23 dataset with candidates C3 + C4.
Mathematics 13 00604 g0a4
To summarize this in-depth pre-election analysis, we note that:
  • Our TA methods (ETA, PTA) exhibit slightly less volatility compared to the pollster predictions (PPs) in the 30 days leading up to the election.
  • The forecasting ranges for TA are narrower than those of the pollster predictions.
  • Except for CC, which yields no insight on the opinion dynamics, we note fewer/or less ample shifts in the predicted election outcomes with ETA and PTA compared to the pollster predictions.

Appendix B. Numerical Processing of the Electoral Computational Framework

To further support the description of the TA algorithm provided in Section 2.3 and Figure 2; here, we detail a numerical example of processing raw election data, based on the Indonesian IDO’24 dataset. The raw pre-election polls are found on the dedicated Wikipedia page under the generic form of: list of pollster names, the date (interval) of the poll, and the respective results towards each candidate. If a poll interval is provided, we consider the last day of the poll as the publication date.
In Figure A5A, an illustrative list of pre-election pollster predictions is given, for three competing candidates C1–C3, ranging from 31 October 2023 to 8 February 2024. The elections and final results were obtained on 14 February 2024. These raw data are first pipelined to the RawDatasetParser (see Figure 2A) which creates a chronologically ordered list (i.e., from oldest poll the the last poll before the elections) of poll dates in the format MM-DD-YY (month-day-year). Also, a separate list of normalized vote percentages are allocated to each candidate corresponding to each date on which a poll was published. For all other dates, when no polls were published, and zeros are filled in as percentages. The resulting lists are concatenated in the DateVoteParser (see Figure 2B), and the dates are removed, leaving the set Ω of daily opinion for each candidate (exemplified in Figure A5B). Finally, the set Ω is sent to the TA_Simulator, which computes the daily opinion pulses Ψ k ( t ) for each candidate; from these, the general opinion Θ is extrapolated as a normalized percentage of each pulse. Figure A5C shows the final numerical results for the opinion towards candidates C1–C3. The final predicted opinion towards the candidates is on the last row, namely C1:57.4%, C2:25.27%, and C3:17.33%.
Figure A5. A numerical example of the processing of raw pre-election poll data (example based on the Indonesian IDO’24 dataset). (A) The raw pre-election poll data, in inverse chronological order, under the generic form (e.g., as found on Wikipedia) consisting of the pollster name, date (interval) of the poll, and number of votes or percentages toward each candidate. (B) Raw data are processed and split into the two vectors for candidate percentages (normalized, if necessary) and poll dates (same inverse chronological ordering remains, but in MM-DD-YY format). (C) The two vectors are joined and assigned in chronological order for each day on the pre-election time axis. If no poll was published on a date, the votes are filled in with zeros. (D) The ordered opinion poll set Ω is used for computing the opinion pulses Ψ k ( t ) towards each candidate, on each day 0 t < t e . Based on the pulses Ψ , the general opinion Θ k ( t ) is computed in time for each candidate C1–C3.
Figure A5. A numerical example of the processing of raw pre-election poll data (example based on the Indonesian IDO’24 dataset). (A) The raw pre-election poll data, in inverse chronological order, under the generic form (e.g., as found on Wikipedia) consisting of the pollster name, date (interval) of the poll, and number of votes or percentages toward each candidate. (B) Raw data are processed and split into the two vectors for candidate percentages (normalized, if necessary) and poll dates (same inverse chronological ordering remains, but in MM-DD-YY format). (C) The two vectors are joined and assigned in chronological order for each day on the pre-election time axis. If no poll was published on a date, the votes are filled in with zeros. (D) The ordered opinion poll set Ω is used for computing the opinion pulses Ψ k ( t ) towards each candidate, on each day 0 t < t e . Based on the pulses Ψ , the general opinion Θ k ( t ) is computed in time for each candidate C1–C3.
Mathematics 13 00604 g0a5

Appendix C. Pre-Electoral Datasets

Eight out of the ten electoral datasets used to validate our ETA and PTA methods are curated from the dedicated Wikipedia pages for each respective presidential election. The two US election datasets are taken from the Real Clear Politics website.
It should be noted that the data available under each link are in raw format and need to undergo the step-by-step processing, as explained in Figure 2 and Appendix B, Figure A5.
The election datasets, corresponding web links and last edit time, where the raw pre-election poll data can be found, are as follows:

Appendix D. Best Pollster Predictions and Election Candidates

In our validation methodology, we utilize ARIMA (AR) and the best pollster prediction (BPP) estimates as benchmarks for our two TA methods. This appendix provides details on the BPP data. We use the term “best pollster” to describe the pollster with the minimal estimation error, measured by the total positive error TPE (see Equation (10)), compared to the ground-truth results (see Table 1). Here, Table A4 lists the best (and second-best) pollsters for each election from each of the ten datasets, along with their TPE, MAPE, RMSE, and predictions for the election candidates. Second-best pollsters are given only when the BPP does not score the best TPE, MAPE and RMSE overall, like on the case of ARG’23, BRZ’22, and CND’21.
Furthermore, we list the names of all candidates numbered as C k in each dataset:
  • ARG’23: C1—Sergio Massa, C2—Patricia Bullrich, C3—Javier Milei, C4—Myriam Bregman, C5—Juan Schiaretti.
  • BRZ’22: C1—Jair Bolsonaro, C2—Luiz Inácio Lula da Silva, C3—Ciro Gomes, C4—Simone Tebet.
  • CND’21: C1—CPC (Pierre Poilievre), C2—LPC (Justin Trudeau), C3—NDP (Jagmeet Singh), C4—BQ (Yves-François Blanchet), C5—GPC (Elizabeth May), C6—PPC (Maxime Bernier).
  • FRA’22: C1—Jean-Luc Mélenchon, C2—Emmanuel Macron, C3—Marine Le Pen, C4—Éric Zemmour, C5—Others.
  • IDO’24: C1—Prabowo Subianto, C2—Anies Baswedan, C3—Ganjar Pranowo.
  • POL’20: C1—Andrzej Duda, C2—Rafał Trzaskowski, C3—Others.
  • ROM’24: C1—Mircea Geoana, C2—Marcel Ciolacu, C3—George Simion, C4—Nicolae Ciuca, C5—Elena Lasconi, C6—Calin Georgescu.
  • TRK’23: C1—Recep Tayyip Erdoğan, C2—Kemal Kılıçdaroğlu, C3—Sinan Oğan, C4—Muharrem İnce.
  • USA’20: C1—Joe Biden, C2—Donald Trump.
  • USA’24: C1—Kamala Harris, C2—Donald Trump.
Table A4. The top pollsters from each election dataset, their candidate predictions, and associated estimation errors TPE, MAPE, and RMSE.
Table A4. The top pollsters from each election dataset, their candidate predictions, and associated estimation errors TPE, MAPE, and RMSE.
DatasetBPPC1C2C3C4C5C6TPEMAPERMSE
ARG’23Consultora Tendencias29.823.130.23.94.611.2119.833.28
Atlas Intel30.924.426.53.21013.5519.153.36
BRZ’22El Electoral38.548.55.558.0728.032.68
IPEC34475513.4327.224.77
CND’21Research Co.3232197466.919.641.23
Angus Reid3230207357.910.951.59
FRA’22Harris-Interactive1827248.522.59.6411.552.26
IDO’24SPIN54.824.316.14.813.772.23
POL’20Indicator42.3131.0325.521.981.820.77
ROM’24Atlas Intel6.923.716.914.317.88.129.931.576.86
TRK’23BETİMAR49.1455.60.31.19.910.31
USA’20IBD/TIPP50462.32.321.17
USA’24Atlas Intel49502.32.361.20

Appendix E. Forecasting Performance During the Pre-Elction Period

We conduct a feasibility analysis of our computational framework based on TA applied at several moments in time during the pre-election period of the USA’20 and USA’24 elections. Table A5 shows interim poll estimates from the pre-election phase of the two datasets recorded at t = { t e 50 , t e 100 , t e 150 , t e 200 } days. The analyzed period spans over 200 days before the election day t e ; this translates into the intervals of 17 April to 3 November 2020 for USA’20, and 19 April to 5 November 2024 for the USA’24 Elections. Estimates for only the top two candidates (C1, C2) are presented as percentage points. The forecasting performance metrics include total positive error (TPE), mean absolute percentage error (MAPE), and root mean squared error (RMSE).
Table A5. Electoral poll forecasts and performance metrics at four moments in time before the USA’24 elections. Dates are relative to the election date t e (05.11.2024).
Table A5. Electoral poll forecasts and performance metrics at four moments in time before the USA’24 elections. Dates are relative to the election date t e (05.11.2024).
USA’20C1C2TPEMAPERMSEUSA’24C1C2TPEMAPERMSE
t e 51.446.9 t e 48.450
BPP50462.32.321.18BPP49500.60.620.42
ETA51.8548.151.71.770.94ETA49.5450.461.61.640.87
PTA51.8648.141.71.770.94PTA49.5450.461.61.640.87
AR51.2842.414.614.93.18AR49.3748.622.352.381.19
SA49.9242.415.976.233.34SA48.1547.432.822.831.83
CC53.9946.013.483.471.94CC50.2949.712.182.241.35
t = t e 50 t = t e 50
BPP52425.55.813.49BPP52485.65.722.91
ETA53.8646.143.223.21.82ETA50.0249.981.641.691.15
PTA53.8546.153.23.181.81PTA50.3549.652.32.361.4
AR49.5342.076.76.973.66AR49.4547.143.913.942.15
SA49.642.286.426.683.51SA47.4747.143.793.822.13
CC53.9946.013.483.471.94CC50.1249.881.841.91.22
t = t e 100 t = t e 100
BPP55455.55.532.88BPP46457.47.483.92
ETA55.3244.686.146.183.19ETA49511.61.620.82
PTA55.3644.646.226.263.22PTA49.7550.251.61.640.97
AR49.6641.327.327.644.13AR45.947.764.744.822.37
SA49.6942.386.236.483.42SA45.947.764.744.822.37
CC54.0345.973.563.551.97CC48.9351.071.61.620.84
t = t e 150 t = t e 150
BPP454211.311.455.7BPP45485.45.512.79
ETA55.8744.137.247.33.72ETA48.3251.681.761.761.19
PTA55.744.36.96.953.55PTA48.3551.651.71.71.17
AR48.9242.86.586.783.39AR4549.53.94.012.43
SA49.6942.85.816.033.14SA4549.53.94.012.43
CC53.7846.223.063.041.75CC48.3151.691.781.781.2
t = t e 200 t = t e 200
BPP48428.38.534.22BPP43469.49.584.75
ETA52.7547.251.71.690.99ETA48.3251.681.761.761.19
PTA52.7547.251.71.690.99PTA48.3551.651.71.71.17
AR4943.046.266.453.21AR43469.49.584.75
SA50.243.045.065.282.86SA43469.49.584.75
CC53.9646.043.423.411.91CC48.3151.691.781.781.2

References

  1. Pastor-Satorras, R.; Castellano, C.; Van Mieghem, P.; Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. 2015, 87, 925. [Google Scholar] [CrossRef]
  2. Jackson, M.O.; Watts, A. The evolution of social and economic networks. J. Econ. Theory 2002, 106, 265–295. [Google Scholar] [CrossRef]
  3. Easley, D.; Kleinberg, J. Networks, Crowds, and Markets: Reasoning About a Highly Connected World; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  4. Nadeau, R.; Lewis-Beck, M.S.; Bélanger, É. Electoral forecasting in France: A multi-equation solution. Int. J. Forecast. 2010, 26, 11–18. [Google Scholar] [CrossRef]
  5. Whiteley, P. Electoral forecasting from poll data: The British case. Br. J. Political Sci. 1979, 9, 219–236. [Google Scholar] [CrossRef]
  6. Jensen, J.B.; Quinn, D.P.; Weymouth, S. Winners and losers in international trade: The effects on US presidential voting. Int. Organ. 2017, 71, 423–457. [Google Scholar] [CrossRef]
  7. Lewis-Beck, M.S.; Tien, C. Voters as forecasters: A micromodel of election prediction. Int. J. Forecast. 1999, 15, 175–184. [Google Scholar] [CrossRef]
  8. Golbeck, J.; Hansen, D. A method for computing political preference among Twitter followers. Soc. Netw. 2014, 36, 177–184. [Google Scholar] [CrossRef]
  9. Rodriguez, M.G.; Balduzzi, D.; Schölkopf, B. Uncovering the temporal dynamics of diffusion networks. arXiv 2011, arXiv:1105.0697. [Google Scholar]
  10. Guille, A.; Hacid, H. A predictive model for the temporal dynamics of information diffusion in online social networks. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; ACM: New York, NY, USA, 2012; pp. 1145–1152. [Google Scholar]
  11. Christensen, W.F.; Florence, L.W. Predicting presidential and other multistage election outcomes using state-level pre-election polls. Am. Stat. 2008, 62, 1–10. [Google Scholar] [CrossRef]
  12. Kiewiet de Jonge, C.P.; Langer, G.; Sinozich, S. Predicting State Presidential Election Results Using National Tracking Polls and Multilevel Regression with Poststratification (MRP). Public Opin. Q. 2018, 82, 419–446. [Google Scholar] [CrossRef]
  13. Hummel, P.; Rothschild, D. Fundamental models for forecasting elections at the state level. Elect. Stud. 2014, 35, 123–139. [Google Scholar] [CrossRef]
  14. Burnap, P.; Gibson, R.; Sloan, L.; Southern, R.; Williams, M. 140 characters to victory?: Using Twitter to predict the UK 2015 General Election. Elect. Stud. 2016, 41, 230–233. [Google Scholar] [CrossRef]
  15. Usher, J.; Dondio, P. BREXIT Election: Forecasting a Conservative Party Victory through the Pound using ARIMA and Facebook’s Prophet. In Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics, Biarritz, France, 30 June–3 July 2020; pp. 123–128. [Google Scholar]
  16. Ballew, C.C.; Todorov, A. Predicting political elections from rapid and unreflective face judgments. Proc. Natl. Acad. Sci. USA 2007, 104, 17948–17953. [Google Scholar] [CrossRef] [PubMed]
  17. Lewis-Beck, M.S.; Skalaban, A. Citizen forecasting: Can voters see into the future? Br. J. Political Sci. 1989, 19, 146–153. [Google Scholar] [CrossRef]
  18. Tierney, G.; Volfovsky, A. Bias and excess variance in election polling: A not-so-hidden Markov model. J. R. Stat. Soc. Ser. A Stat. Soc. 2024, qnae066. [Google Scholar] [CrossRef]
  19. Graefe, A. Limits of Domain Knowledge in Election Forecasting: A Comparison of Poll Averages and Expert Forecasts. Int. J. Public Opin. Res. 2024, 36, edae002. [Google Scholar] [CrossRef]
  20. Donnini, Z.; Louit, S.; Wilcox, S.; Ram, M.; McCaul, P.; Frank, A.; Rigby, M.; Gowins, M.; Tranter, S. Election night forecasting with DDHQ: A real-time predictive framework. Harv. Data Sci. Rev. 2024, 6. [Google Scholar] [CrossRef]
  21. Liu, R.; Yao, X.; Guo, C.; Wei, X. Can we forecast presidential election using twitter data? An integrative modelling approach. Ann. GIS 2021, 27, 43–56. [Google Scholar] [CrossRef]
  22. Weimann, G. The obsession to forecast: Pre-election polls in the Israeli press. Public Opin. Q. 1990, 54, 396–408. [Google Scholar] [CrossRef]
  23. Golbeck, J. Analyzing the Social Web; Elsevier: Amsterdam, The Netherlands, 2013. [Google Scholar]
  24. Hussain, M.M.; Howard, P.N. What best explains successful protest cascades? ICTs and the fuzzy causes of the Arab Spring. Int. Stud. Rev. 2013, 15, 48–66. [Google Scholar] [CrossRef]
  25. Howard, P.N.; Duffy, A.; Freelon, D.; Hussain, M.M.; Mari, W.; Maziad, M. Opening closed regimes: What was the role of social media during the Arab Spring? SSRN 2011. [Google Scholar] [CrossRef]
  26. Hughes, A.L.; Palen, L. Twitter adoption and use in mass convergence and emergency events. Int. J. Emerg. Manag. 2009, 6, 248–260. [Google Scholar] [CrossRef]
  27. Conway, B.A.; Kenski, K.; Wang, D. Twitter use by presidential primary candidates during the 2012 campaign. Am. Behav. Sci. 2013, 57, 1596–1610. [Google Scholar] [CrossRef]
  28. Papasolomou, I.; Melanthiou, Y. Social media: Marketing public relations’ new best friend. J. Promot. Manag. 2012, 18, 319–328. [Google Scholar] [CrossRef]
  29. Hufnagel, L.; Brockmann, D.; Geisel, T. Forecast and control of epidemics in a globalized world. Proc. Natl. Acad. Sci. USA 2004, 101, 15124–15129. [Google Scholar] [CrossRef] [PubMed]
  30. Topîrceanu, A.; Precup, R.E. A novel geo-hierarchical population mobility model for spatial spreading of resurgent epidemics. Sci. Rep. 2021, 11, 14341. [Google Scholar] [CrossRef] [PubMed]
  31. Gladwin, H.; Lazo, J.K.; Morrow, B.H.; Peacock, W.G.; Willoughby, H.E. Social science research needs for the hurricane forecast and warning system. Nat. Hazards Rev. 2007, 8, 87–95. [Google Scholar] [CrossRef]
  32. Asur, S.; Huberman, B.A. Predicting the future with social media. In Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Toronto, ON, Canada, 31 August–3 September 2010; pp. 492–499. [Google Scholar]
  33. Leiter, D.; Murr, A.; Ramírez, E.R.; Stegmaier, M. Social networks and citizen election forecasting: The more friends the better. Int. J. Forecast. 2018, 34, 235–248. [Google Scholar] [CrossRef]
  34. Jennings, W.; Lewis-Beck, M.; Wlezien, C. Election forecasting: Too far out? Int. J. Forecast. 2020, 36, 949–962. [Google Scholar] [CrossRef]
  35. Kennedy, R.; Wojcik, S.; Lazer, D. Improving election prediction internationally. Science 2017, 355, 515–520. [Google Scholar] [CrossRef]
  36. Wang, W.; Rothschild, D.; Goel, S.; Gelman, A. Forecasting elections with non-representative polls. Int. J. Forecast. 2015, 31, 980–991. [Google Scholar] [CrossRef]
  37. Jackson, M.O.; Rogers, B.W.; Zenou, Y. The economic consequences of social-network structure. J. Econ. Lit. 2017, 55, 49–95. [Google Scholar] [CrossRef]
  38. Barabási, A.L.; Pósfai, M. Network Science; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
  39. Topirceanu, A.; Udrescu, M.; Vladutiu, M. Network Fidelity: A Metric to Quantify the Similarity and Realism of Complex Networks. In Proceedings of the 2013 International Conference on Cloud and Green Computing, Karlsruhe, Germany, 30 September–2 October 2013; pp. 289–296. [Google Scholar]
  40. Topirceanu, A. Electoral forecasting using a novel temporal attenuation model: Predicting the US presidential elections. Expert Syst. Appl. 2021, 182, 115289. [Google Scholar] [CrossRef]
  41. Axelrod, R. The dissemination of culture: A model with local convergence and global polarization. J. Confl. Resolut. 1997, 41, 203–226. [Google Scholar] [CrossRef]
  42. Goldenberg, J.; Libai, B.; Muller, E. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Mark. Lett. 2001, 12, 211–223. [Google Scholar] [CrossRef]
  43. Topirceanu, A.; Udrescu, M.; Vladutiu, M.; Marculescu, R. Tolerance-based interaction: A new model targeting opinion formation and diffusion in social networks. PeerJ Comput. Sci. 2016, 2, e42. [Google Scholar] [CrossRef]
  44. Guille, A.; Hacid, H.; Favre, C.; Zighed, D.A. Information diffusion in online social networks: A survey. ACM Sigmod Rec. 2013, 42, 17–28. [Google Scholar] [CrossRef]
  45. Wallinga, J.; Teunis, P. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Am. J. Epidemiol. 2004, 160, 509–516. [Google Scholar] [CrossRef] [PubMed]
  46. Myers, S.; Leskovec, J. On the convexity of latent social network inference. In Proceedings of the 24th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; pp. 1741–1749. [Google Scholar]
  47. Granovetter, M. Threshold models of collective behavior. Am. J. Sociol. 1978, 83, 1420–1443. [Google Scholar] [CrossRef]
  48. Coppedge, M.; Gerring, J.; Knutsen, C.H.; Lindberg, S.I.; Teorell, J.; Altman, D.; Bernhard, M.; Fish, M.S.; Glynn, A.; Hicken, A.; et al. V-dem codebook v9. SSRN 2019. [Google Scholar] [CrossRef]
  49. Merh, N.; Saxena, V.P.; Pardasani, K.R. A comparison between hybrid approaches of ANN and ARIMA for Indian stock trend forecasting. Bus. Intell. J. 2010, 3, 23–43. [Google Scholar]
  50. Hyndman, R.J.; Khandakar, Y. Automatic time series forecasting: The forecast package for R. J. Stat. Softw. 2008, 27, 1–22. [Google Scholar] [CrossRef]
  51. Sammut, C.; Webb, G.I. Encyclopedia of Machine Learning; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  52. Becker, J.; Brackbill, D.; Centola, D. Network dynamics of social influence in the wisdom of crowds. Proc. Natl. Acad. Sci. USA 2017, 114, E5070–E5076. [Google Scholar] [CrossRef] [PubMed]
  53. Sjöberg, L. Are all crowds equally wise? A comparison of political election forecasts by experts and the public. J. Forecast. 2009, 28, 1–18. [Google Scholar] [CrossRef]
  54. Graber, D.A.; Dunaway, J. Mass Media and American Politics; CQ Press: Washington, DC, USA, 2017. [Google Scholar]
  55. Wright, M.J.; Farrar, D.P.; Russell, D.F. Polling accuracy in a multiparty election. Int. J. Public Opin. Res. 2014, 26, 113–124. [Google Scholar] [CrossRef]
  56. Mellon, J.; Prosser, C. Twitter and Facebook are not representative of the general population: Political attitudes and demographics of British social media users. Res. Politics 2017, 4, 2053168017720008. [Google Scholar] [CrossRef]
  57. Morris, D.S.; Morris, J.S. New social media nones: How and why Americans have changed their use of social media to consume political news. J. Inf. Commun. Ethics Soc. 2023, 21, 468–484. [Google Scholar] [CrossRef]
  58. Bernhardt, D.; Krasa, S.; Polborn, M. Political polarization and the electoral effects of media bias. J. Public Econ. 2008, 92, 1092–1104. [Google Scholar] [CrossRef]
Figure 1. Proof-of-concept example of the two temporal attenuation (TA) models applied to three candidates (C1—blue, C2—green, C3—orange), each obtaining a variable number of votes in the pre-election (PE) polls spanning over t = 280 days before the election day. (A) Opinion pulse Ψ k ( t ) evolution for ETA corresponding to the poll vectors ω k ( t ) of each candidate, over the last 30 days before election. (B) General opinion Θ k ( t ) evolution for ETA corresponding to the pulse in panel (A). Here, the whole PE period is shown with all corresponding dynamics in opinion. On the upper dotted timeline, the virtual election winners are shown at several moments during the PE period. The last 30 days ( 250 < t < 280 ) before the election day are detailed in the right-most panel, predicting candidate C1 as the election winner. (C) Similarly, the opinion pulse Ψ k ( t ) evolution for PTA corresponds to the same poll vectors ω k ( t ) . (D) General opinion Θ k ( t ) evolution for PTA, exhibiting slightly distinct temporal patterns compared to ETA in panel (B). These small differences lead to a different predicted election winner (C3) than in the case of ETA.
Figure 1. Proof-of-concept example of the two temporal attenuation (TA) models applied to three candidates (C1—blue, C2—green, C3—orange), each obtaining a variable number of votes in the pre-election (PE) polls spanning over t = 280 days before the election day. (A) Opinion pulse Ψ k ( t ) evolution for ETA corresponding to the poll vectors ω k ( t ) of each candidate, over the last 30 days before election. (B) General opinion Θ k ( t ) evolution for ETA corresponding to the pulse in panel (A). Here, the whole PE period is shown with all corresponding dynamics in opinion. On the upper dotted timeline, the virtual election winners are shown at several moments during the PE period. The last 30 days ( 250 < t < 280 ) before the election day are detailed in the right-most panel, predicting candidate C1 as the election winner. (C) Similarly, the opinion pulse Ψ k ( t ) evolution for PTA corresponds to the same poll vectors ω k ( t ) . (D) General opinion Θ k ( t ) evolution for PTA, exhibiting slightly distinct temporal patterns compared to ETA in panel (B). These small differences lead to a different predicted election winner (C3) than in the case of ETA.
Mathematics 13 00604 g001
Figure 2. Step-wise adaption of our TA method. (A) The first phase involves parsing the pre-election poll dataset (yellow table, with example data) to extract a vector of dates for each poll in chronological order, and their corresponding votes normalized as percentages towards each candidate. (B) With polls sorted chronologically, the oldest poll’s date is set as t = 0 and the election day as t = t e . Then, each poll is assigned a relative date 0 t < t e . The set Ω is created from the normalized poll vectors ( ω k ( t ) ) for each available date, or an empty vector ( { 0 , 0 } ) if no poll exists on that date. (C) The third phase feeds Ω to our TA framework and calculates the opinion pulses Ψ k ( t ) based on parameters w k ( t ) ,   μ ,   δ ,   ω k ( t ) . Accordingly, the general opinion Θ k ( t ) for each candidate is derived from Ψ k ( t ) and the results estimation is given as output of the framework.
Figure 2. Step-wise adaption of our TA method. (A) The first phase involves parsing the pre-election poll dataset (yellow table, with example data) to extract a vector of dates for each poll in chronological order, and their corresponding votes normalized as percentages towards each candidate. (B) With polls sorted chronologically, the oldest poll’s date is set as t = 0 and the election day as t = t e . Then, each poll is assigned a relative date 0 t < t e . The set Ω is created from the normalized poll vectors ( ω k ( t ) ) for each available date, or an empty vector ( { 0 , 0 } ) if no poll exists on that date. (C) The third phase feeds Ω to our TA framework and calculates the opinion pulses Ψ k ( t ) based on parameters w k ( t ) ,   μ ,   δ ,   ω k ( t ) . Accordingly, the general opinion Θ k ( t ) for each candidate is derived from Ψ k ( t ) and the results estimation is given as output of the framework.
Mathematics 13 00604 g002
Figure 3. Comparison between the best pollster predictions (BPPs) and ETA (red), respectively, as well as PTA (violet), in terms of (A) the TPE forecasting errors and the (B) RMSE errors. All offsets above 0 indicate superior predictive performance of our TA methods. (C) Heat map showing the direct comparison, in terms of TPE, between BPP and our two TA methods. Darker green cells indicate superior performance for our TA methods, and darker orange cells translate to inferior performance. (D) Heat map highlighting the datasets (each one of the 10 columns) in which the forecasting methods correctly predict the election winner.
Figure 3. Comparison between the best pollster predictions (BPPs) and ETA (red), respectively, as well as PTA (violet), in terms of (A) the TPE forecasting errors and the (B) RMSE errors. All offsets above 0 indicate superior predictive performance of our TA methods. (C) Heat map showing the direct comparison, in terms of TPE, between BPP and our two TA methods. Darker green cells indicate superior performance for our TA methods, and darker orange cells translate to inferior performance. (D) Heat map highlighting the datasets (each one of the 10 columns) in which the forecasting methods correctly predict the election winner.
Mathematics 13 00604 g003
Figure 4. The evolution of the poll-forecasting error TPE, for the (A) USA’20 and (B) USA’2024 elections, of four performance metrics (BPP, ETC, PTA, CC) at four moments in time before the election dates. The colored shapes at the bottom of each figure indicate the best predicting method at moment t. The color coding corresponds to the legend found under each figure panel.
Figure 4. The evolution of the poll-forecasting error TPE, for the (A) USA’20 and (B) USA’2024 elections, of four performance metrics (BPP, ETC, PTA, CC) at four moments in time before the election dates. The colored shapes at the bottom of each figure indicate the best predicting method at moment t. The color coding corresponds to the legend found under each figure panel.
Mathematics 13 00604 g004
Table 1. Election datasets used for validation, detailing the election date, the number of pre-election (PE) polls available in the dataset, the PE period, and the final election results expressed as percentages (%). We use the notations C1–C6 for a maximum of six candidates; or place a dash (–) otherwise.
Table 1. Election datasets used for validation, detailing the election date, the number of pre-election (PE) polls available in the dataset, the PE period, and the final election results expressed as percentages (%). We use the notations C1–C6 for a maximum of six candidates; or place a dash (–) otherwise.
Elections/YearSymbolElection DatePE PollsPE PeriodC1C2C3C4C5C6
Argentina 2023ARG’2322 October 2023883 January–13 October 202336.6823.8329.982.76.78
Brazil 2022BRZ’222 October 2024692 July–1 October 202443.248.433.044.16
Canada 2021CND’2120 September 20213663 January 2020–19 September 202133.732.617.87.62.34.9
France 2022FRA’2210 April 20222423 January–8 April 202221.9527.8523.157.0719.94
Indonesia 2024IDO’2414 February 20245231 October 2023–8 February 202458.5924.9516.47
Poland 2020POL’2028 June 20205315 May–26 June 202043.530.4625.74
Romania 2024ROM’2424 November 20243624 January–22 November 20246.3219.1513.868.7919.1822.94
Turkey 2023TRK’2314 May 20239014 March–13 May 202349.5244.885.170.43
US 2020USA’203 November 202028021 January 2019–2 November 202051.446.9
US 2024USA’245 November 202414314 February–4 November 202448.450.0
Table 2. The estimation-error TPE for all forecasting methods across the ten electoral datasets, from left to right, best pollster prediction (BPP), exponential TA (ETA), power-law TA (PTA), Arima (AR), survey averaging (SA), and cumulative counting (CC).
Table 2. The estimation-error TPE for all forecasting methods across the ten electoral datasets, from left to right, best pollster prediction (BPP), exponential TA (ETA), power-law TA (PTA), Arima (AR), survey averaging (SA), and cumulative counting (CC).
ElectionsBPPETAPTAARSACC
ARG’2311.219.459.3210.9621.6918.55
BRZ’228.078.268.2614.4416.0110.28
CND’216.908.388.577.5210.9014.26
FRA’229.6310.4810.4411.2730.3331.47
IDO’244.812.192.056.8819.8014.35
POL’201.981.541.554.7111.5510.18
ROM’2429.9025.3525.3128.4942.1943.03
TRK’231.100.510.376.8412.2913.07
USA’202.301.701.704.615.973.48
USA’240.601.601.602.352.822.18
Mean7.656.956.929.8117.3616.09
Table 3. MAPE (left) and RMSE (right) for all forecasting methods across the ten electoral datasets, from left to right, best pollster prediction (BPP), exponential TA (ETA), power-law TA (PTA), Arima (AR), survey averaging (SA), and cumulative counting (CC).
Table 3. MAPE (left) and RMSE (right) for all forecasting methods across the ten electoral datasets, from left to right, best pollster prediction (BPP), exponential TA (ETA), power-law TA (PTA), Arima (AR), survey averaging (SA), and cumulative counting (CC).
ElectionsBPPETAPTAARSACCBPPETAPTAARSACC
ARG’2319.8314.651417.1827.4826.153.283.023.013.355.044.37
BRZ’2228.0327.5827.6438.2938.4740.852.682.312.314.453.25
CND’2119.6421.4921.9118.3331.7641.081.231.982.021.322.12.75
FRA’2211.5512.8812.813.8438.0140.822.262.462.442.696.827.17
IDO’243.772.52.256.7621.218.572.230.770.722.687.965.85
POL’201.821.571.584.3312.2311.90.770.510.521.9143.86
ROM’2431.5726.4726.4528.855.357.126.865.855.846.58.348.42
TRK’239.911.180.2820.01196.3235.070.310.150.122.083.133.29
USA’202.321.771.764.96.223.461.170.930.933.173.341.94
USA’240.621.631.632.382.822.240.420.860.861.191.821.35
Mean12.9111.1711.0315.4842.9847.732.121.891.882.934.764.23
Table 4. The column ‘Lead’ quantifies the point difference between the winner and runner-up of each election. Subsequent columns indicate the relative deviation of the forecasting models from the lead. A deviation closer to zero signifies a more accurate estimation of the victorious candidate. The last line represents the mean deviations taken over all datasets.
Table 4. The column ‘Lead’ quantifies the point difference between the winner and runner-up of each election. Subsequent columns indicate the relative deviation of the forecasting models from the lead. A deviation closer to zero signifies a more accurate estimation of the victorious candidate. The last line represents the mean deviations taken over all datasets.
ElectionsLeadBPPETAPTAARSACC
ARG’236.70−7.10−5.58−5.45−6.66−4.06−4.82
BRZ’225.234.775.045.055.945.005.69
CND’211.10−1.102.072.03−1.19−5.20−7.06
FRA’224.70−1.70−1.67−1.68−1.593.784.42
IDO’2433.64−3.14−1.58−1.60−2.86−10.89−7.21
POL’2013.04−1.76−1.07−1.10−3.722.283.20
ROM’243.76−13.46−11.86−11.86−11.79−10.23−9.82
TRK’234.64−0.540.420.35−5.76−6.83−6.38
USA’204.50−0.50−0.80−0.784.373.013.48
USA’241.6−0.6−0.68−0.68−2.35−2.32−2.18
Mean7.89−2.51−1.57−1.57−2.56−2.55−2.07
Table 5. The estimation-error TPE is measured at several points in time during the USA’20 (upper half) and USA’24 (lower half) pre-election periods. Here, t e represents the election date and t = t e X representing poll estimations obtained X days before the election day. The last three columns display the point difference between PTA and BPP, AR, and CC, respectively. Negative differences ( Δ < 0 ) translate to a lower error; thus, a higher performance of PTA.
Table 5. The estimation-error TPE is measured at several points in time during the USA’20 (upper half) and USA’24 (lower half) pre-election periods. Here, t e represents the election date and t = t e X representing poll estimations obtained X days before the election day. The last three columns display the point difference between PTA and BPP, AR, and CC, respectively. Negative differences ( Δ < 0 ) translate to a lower error; thus, a higher performance of PTA.
TimeBPPETAPTAARSACC Δ BPP PTA Δ AR PTA Δ CC PTA
t e 2.301.701.704.906.233.47−0.60−3.20−1.77
t e 50 5.503.223.206.976.683.47−2.30−3.77−0.27
t e 100 5.506.146.227.646.483.550.72−1.422.67
t e 150 11.307.246.906.786.033.04−4.400.123.86
t e 200 8.301.701.706.455.283.41−6.60−4.75−1.71
t e 0.601.601.602.352.822.181.00−0.75−0.58
t e 50 5.601.642.303.913.791.84−3.30−1.610.46
t e 100 7.401.601.604.744.741.60−5.80−3.140.00
t e 150 5.401.761.703.903.901.78−3.70−2.20−0.08
t e 200 9.401.761.709.409.401.78−7.70−7.70−0.08
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Topîrceanu, A. Macro-Scale Temporal Attenuation for Electoral Forecasting: A Retrospective Study on Recent Elections. Mathematics 2025, 13, 604. https://doi.org/10.3390/math13040604

AMA Style

Topîrceanu A. Macro-Scale Temporal Attenuation for Electoral Forecasting: A Retrospective Study on Recent Elections. Mathematics. 2025; 13(4):604. https://doi.org/10.3390/math13040604

Chicago/Turabian Style

Topîrceanu, Alexandru. 2025. "Macro-Scale Temporal Attenuation for Electoral Forecasting: A Retrospective Study on Recent Elections" Mathematics 13, no. 4: 604. https://doi.org/10.3390/math13040604

APA Style

Topîrceanu, A. (2025). Macro-Scale Temporal Attenuation for Electoral Forecasting: A Retrospective Study on Recent Elections. Mathematics, 13(4), 604. https://doi.org/10.3390/math13040604

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop