1. Introduction
Electoral forecasting represents a significant scientific topic in our increasingly interconnected society and attracts considerable attention across various research disciplines [
1,
2,
3]. Historically, forecasting election results has relied on traditional statistical models that assess opinion polls before election day [
4,
5]. Since the late 1970s, the strategic selection of the election date has been recognized as potentially impacting the results [
5]. In presidential elections, forecasting uses macromodels [
6], which are statistical models reflecting national economic and political changes. Conversely, micromodels are derived from individual voter surveys conducted during the pre-election phase [
7].
Today, forecasting election results benefits from modern techniques such as social media analysis and data science [
8,
9,
10]. Advanced forecasting now often incorporates multilevel regression and post-stratification (MRP) [
11,
12,
13]. Prominent forecasting institutions, especially present in the United States, such as
Real Clear Politics,
FiveThirtyEight, and
Understanding America Study analyze election data using MRP variants. A concise overview of the methodologies utilized by these pollsters implies the following steps:
The polls are weighted/averaged according to the pollster’s reliability.
The polls are adjusted based on factors such as anticipated voter turnout, convention effects, and exclusion of third-party candidates.
Incorporating economic and demographic data applied to scale surveys at the state and national tiers.
The simulation employs probabilistic distributions to represent data uncertainty.
Alongside MRP, auto-regressive integrated moving-average (ARIMA) models applied to daily tweets have been successfully used to forecast the UK General Election [
14] and Brexit using the daily fluctuations of the British Pound [
15]. Studies also suggest that alternative subjective surveying methods can be effective predictors. For instance, the American National Election Surveys from 1956 to 1996 indicate that voters were more capable of predicting presidential election winners [
7]. Another study found that quick, instinctive facial assessments of political candidates better predicted election results than detailed competence evaluations [
16]. Therefore, voter expectation-based prediction models present a promising substitute for traditional statistical methods [
17].
We highlight notable research gaps in current electoral forecasting related to temporal factors. First, there is limited integration of temporal dynamics in traditional models (e.g., ARIMA or statistical polling averages), which only focus on snapshots of polling data, neglecting the evolving nature of opinion over time [
18]. Second, the inconsistent use of poll timing or poll frequency across different time periods hinders identifying the weight of more recent polls and better predicting last-minute opinion swings [
19]. Third, insufficient exploration of opinion decay influence from older polls and the momentum of recent trends are often underexplored or oversimplified in forecasting methodologies [
18]. Fourth, the lack of real-time forecasting models that adjust dynamically as new polls or events emerge [
20]. Fifth, the role of external events (e.g., crises, scandals) in altering temporal opinion trends is not consistently captured [
21]. The implications of addressing the enumerated research gaps are numerous, including overall improved forecasting accuracy, the identification of critical time windows, better understanding of voter behavior, early detection of trends and momentum, and support for real-time decision-making.
Related studies have explored how the extensive media coverage of opinion polls might influence voters before elections [
22]. Social networks are pivotal in the dissemination of information, effectively shaping large-scale events [
8,
23], as seen in the Arab Spring of 2010 [
24,
25] and the US presidential elections of 2008 [
26] and 2012 [
27]. These platforms enable a widespread distribution of information that forms a contemporary social dimension [
10]. Understanding this dynamic layer enhances predictive insights into the real-world social networks they replicate, with applications in marketing, public relations [
3,
28], epidemic spreading [
29,
30], hurricane forecasting [
31], and predicting box-office earnings [
32].
In the more recent literature, we find works, such as Leiter et al. [
33], who argue that individual voter predictions often have higher accuracy than voter intention polls, mainly based on social network activity. The study considers direct measures of social networks and finds that network size, political composition, and frequency of political discussion are the most significant network parameters predicting the accuracy of citizens’ election forecasts. Furthermore, Jennings et al. [
34] provide a data-driven estimation of how much “lead” in the polls allows for the prediction of the election outcome, and offers estimates of the optimal duration to be leading in the polls. Liu et al. [
21] propose a more complex integration of Twitter data into political forecasting than before. That is, tweet sentiment is used to replace polling data, and classical forecasting models are transformed from the national level to the county level (i.e., the finest spatial level of voting counts in the US). The work published by Kennedy et al. [
35] relies on Bayesian additive regression trees (BARTs) to analyze election datasets. Their model is able to predict 80-90% of direct elections from global data. In addition, their most important conclusions are that economic indicators are weak predictors of elections, while global polling is a robust predictor of outcomes. Also, the authors predict that quantitative electoral forecasts are going to remain the most viable forecasting solution in the near future. Finally, we note the work of Wang et al. [
36], which proposes that non-representative polls may be used instead of representative polls, with proper statistical adjustments. The former have the advantage of being faster to gather and at a lesser expense than traditional survey methods. The results from the Xbox gaming platform, adjusted with MRP (multilevel regression and poststratification), were able to provide forecasts as accurate as those of the leading pollsters.
Network science aims to grasp diffusion processes by structuring interactions at the micro scale (i.e., among individuals) and then predicting opinion shifts at the macro scale [
37]. The macroscopic outcomes can be deduced by: (i) identifying when individuals are
indoctrinated by their surroundings (i.e., adopting information, contracting infections, purchasing of goods [
1,
3,
38,
39]), and (ii) forecasting the dissemination of information cascades and their individual-driven diffusion. However, the timing factors are critical in influencing diffusion dynamics [
9,
10], as well as in the timing of releasing opinion polls [
22] or organizing elections [
5].
Indeed, temporal factors play a critical role in election prediction due to the dynamic nature of public opinion and external events. For instance, during the 2016 U.S. presidential race, an impactful announcement regarding Hillary Clinton’s (Democratic candidate) emails occurred just days before the election, potentially swaying undecided voters towards the Republican candidate. Also, in the final days leading up to the 2016 UK Brexit, the assassination of an MP shocked the nation, briefly shifting public sentiment toward the “Remain” camp. During the 2017 French presidential election, some voters shifted their support in the final days toward a more viable candidate to avoid “wasting” their vote, a phenomenon known as tactical voting. Thus, voters rallied around Emmanuel Macron to prevent Marine Le Pen from winning. Similarly, the COVID-19 pandemic significantly impacted election dynamics in 2020, emphasizing healthcare and crisis management in voter decision-making. Notably, Donald Trump’s handling of COVID-19 became a key differentiator for Joe Biden, who emphasized a science-based and empathetic approach, likely helping him secure the presidency.
The motivation of our study is to examine how social macroscopic behavior, in varied electoral contexts across the world, can be inferred just from microscopic temporal dynamics during pre-election phases. Here, we rely on our innovative macro-scale temporal attenuation model (TA), previously introduced in [
40], which is tested here in terms of electoral forecasting accuracy, relying
only on pre-election poll data on each election, from each chosen country. We introduce two variants of the TA model (i.e., ETA and PTA, described in
Section 2), which are distinct in their ability to capture the fluctuating pulse of public opinion as it evolves with the input of poll data.
The effectiveness of our forecasting model is assessed through the total positive error (TPE), the mean absolute percentage error (MAPE), and the root mean squared error (RMSE). The obtained poll predictions are compared with ARIMA models fitted to pre-election data and the best pollster predictions (BPPs) from each studied country, incorporating newer MRP-based forecasts (mainly those from the US). As such, our paper claims the following contributions:
We devise an analytic approach for the macro-scale modeling of multi-opinion systems to improve election poll forecasts.
We establish an experimental framework utilizing pre-election data to test the foundational assumptions of our method.
We provide a case study on ten recent elections from different countries between 2020–2024, evaluating our method’s efficiency against advanced forecasts, including ARIMA and MRP, utilized by top pollsters.
We investigate the real-time applicability of the two TA variants during pre-election periods and compare their performance against the best pollster predictions at various intervals leading up to the election day.
The main claim of this study is to establish and validate an electoral forecasting method that operates independently of demographic, economic, or political-context data. This differentiates our TA method favorably from methods like MRP by allowing for its application, with sufficient reliable public polls, across any global political region, without contextual knowledge. Consequently, the results presented in this paper were obtained without incorporating additional information about any of the chosen countries around the world. Given the persisting socio-political challenges, our research holds substantial societal value, particularly for industry and governments, by offering insights for an enhanced understanding of electoral system dynamics.
3. Materials and Methods
Prediction models that incorporate microscopic interactions frequently depict data as information cascades initiated by opinion sources (also referred to as spreader nodes, stubborn agents, or vital nodes) [
41,
42,
43]. Although these interactions hold significance, it remains technically, ethically, and legally impractical to account for them all (for instance, scrutinizing every tweet from all users in a country before an election). Therefore, we utilize widespread data, specifically pre-election opinion polls, which are accessible and aggregated from various open access sources, as detailed in
Appendix C.
In our previous study [
40], where we first introduced the concept of temporal attenuation (TA), we presented a case study examining US presidential elections from 1968 to 2016, in order to highlight the effectiveness of the TA method compared to several statistical methods for electoral prediction. As such, our previous study tailored TA specifically to the US electoral system: to predict the popular vote for the two candidates from the main political parties.
In terms of the novelty of this study, the idea of a charging and discharging capacitor, in relation to the opinion polls being made public, was introduced in [
40] as the “momentum” of opinion. The momentum was influenced directly by the
raw number of votes received by each candidate from a poll. In contrast, in our current study, we added the opinion weight
which acts like the previous momentum, but uses the
normalized number of votes. We noticed that normalization increases the applicability of the model in multi-party systems. Next, the newly added concept of opinion pulse is based on the opinion weight, which implements the actual exponential or power-law-based attenuation.
Secondly, the current study demonstrates the flexibility of the improved mathematical model, namely by using TA on more heterogeneous datasets (in terms of number of polls and pre-election period duration) and in several multi-party systems (any number of candidates), using 10 datasets from nine different countries around the world, each with significant socio-political differences. We believe that both the mathematical and practical improvements are valuable for the community, since they support the wider applicability of our TA model.
We further compare poll estimates obtained through our ETA and PTA methods against traditional statistical approaches such as ARIMA modeling (AR), survey averaging (SA), cumulative vote counting (CC), and the best pollster predictions (BPPs) of the time. The actual poll results from each election serve as the ground truth for validating forecasts.
3.1. Election Datasets
We aggregate electoral pre-election poll data from the following countries (in alphabetical order): Argentina (2023), Brazil (2022), Canada (2021), France (2022), Indonesia (2024), Poland (2020), Romania (2024), Turkey (2023), and the USA (2020, 2024). We note that the respective Canadian parliamentary elections were for the election of the Prime Minister, while all other datasets correspond to Presidential Elections. In all cases, we aim to predict the popular vote of the first round (to which the pre-election polls refer). For the US, data were aggregated from the
Real Clear Politics website. For all other countries, we found the
Wikipedia pages dedicated to the presidential elections as the most complete and reliable sources for pre-election data.
Table 1 provides information on all 10 datasets, including the election date, the number of pre-election (PE) polls available, the PE period, and the final results for each candidate. We chose varied electoral systems, with 2–6 (main) candidates, to showcase that TA works well in all situations. The percentages displayed in
Table 1 are used as the ground truth to measure the performance of our TA method. Additional information on the input data, including the names of all candidates, is found in
Appendix D.
The inclusion of these datasests aims to ensure relative recency of the elections, a broad coverage of different electoral systems, and obtaining a high variability in terms of the dataset sparseness. As such, a close look at
Table 1 shows that we are using electoral datasets with as few as 52 pre-election polls (Indonesia), and as many as 366 (Canada); the pre-election period ranges from 1.5 months (Poland) to 21 months (USA); the number of candidates varies from 2 to 6 (where necessary, we excluded all other candidates above the top six). In terms of geographic diversity, we managed to collect data from four continents. Unfortunately, we did not find sufficient data to include any African countries. Finally, we also intended to provide diversity in terms of the democratic robustness of the electoral system [
48].
3.2. Statistical Methods for Poll Estimation
To quantify the electoral forecasting accuracy of our two TA methods (ETA and PTA), we present statistical benchmarks that allow for comparison between traditional models and the proposed TA models. These benchmarks are categorized into two groups: statistical estimates, such as cumulative counting (CC) and survey averaging (SA), and advanced statistical models, including ARIMA (AR) and best pollster predictors (BPPs).
In cumulative vote counting (CC), the total votes for each electoral candidate
k are obtained by aggregating votes from polls
over the entire pre-election period
. Importantly, CC utilizes the absolute vote count per poll, i.e.,
, rather than the normalized count
. Thus, we define a similar cumulative opinion pulse
as follows:
When the polling period ends, i.e.,
, each cumulative opinion pulse
retains the aggregate votes for each of the
N candidates. The current opinion towards a candidate
k,
, can be determined at any moment by normalizing the respective opinion pulses:
Survey averaging (SA) involves calculating the mean of the normalized poll results throughout the pre-election timeframe. To compute the general opinion
after
t days have passed since the beginning of the pre-election period, the normalized poll vector
is utilized directly:
Equation (
9) expresses the general opinion at any time
t (
) by totaling all normalized votes from
and dividing by the number of existing polls (
cardinal) in that interval. For the whole pre-election period, when
, the general opinion towards a candidate
k, based on survey averaging, becomes
. The key difference between SA and CC lies in the reliance of CC on the absolute vote count for candidates, while SA utilizes normalized polling data.
ARIMA (AR) models rely solely on the time-series data of a dependent variable, ignoring any independent variables. They utilize current and historical data points of the variable to produce dependable short-term forecasts. ARIMA models have been applied effectively to the UK’s Brexit forecast by analyzing British Pound fluctuations [
15], the UK General Election through Twitter data [
14], and, most commonly, in predicting stock-market trends [
49].
Often, the usage of AR starts though a grid search for hyper-parameter optimization to identify optimal combinations of (p, d, q), where p is the autoregressive order, d is for transforming nonstationary data into stationary, and q is the moving-average order. Initial experiments pinpoint to an ARIMA (5,1,0) model for the election datasets. However, we decided to use the
auto.arima method available in the R package
forecast, which uses a variation of the Hyndman–Khandakar algorithm [
50], that combines multiple automatic tests. As a benchmark, ARIMA estimations are made on election day
, specifically 1–9 days after the final pre-election poll, when the last poll was published in each of our ten datasets.
Finally, we employ the best pollster predictions (BPPs) from pollsters for each election cycle. The top pollsters in each election dataset, along with their candidate projections, are shown in
Appendix D. Over the last decade, pollsters have incorporated optimal statistical and computational techniques, such as MRP, to improve prediction accuracy. Including BPPs in our comparative analysis serves as a significant reference point, since these pollsters employ the state of the art in current electoral forecasting; thus, their predictions are the best real-world benchmark.
3.3. Forecasting Performance Metrics
In each forecast based on TA with the parameters
and
, we quantify the total positive error TPE(
) as an absolute point deviation from the actual election results. Using the ground-truth results, denoted as
(dataset
d symbols are defined in
Table 1), the total positive error TPE is defined as the cumulative positive differences in estimation errors for all
N candidates:
The metric TPE provides a gauge of performance and serves as an intuitive performance indicator of TA compared to the state-of-the-art forecasting methods. However, due to the underlying simplicity of TPE, we supplement all of our model’s performance evaluations with the mean absolute percentage error (MAPE) and the root mean squared error (RMSE) [
51].
MAPE and RMSE serve to evaluate our TA method’s forecasting accuracy against ground-truth results for each electoral candidate. The formal definition of MAPE is:
MAPE denotes a percentage deviation, taking into account the sign between the actual outcomes (, in dataset d for candidate k) and the forecasted outcomes ().
RMSE denotes an unsigned absolute difference between actual results and predicted outcomes. In Equations (
11) and (
12),
N represents the number of candidates assessed. An average MAPE and RMSE can be calculated across all election years.
Moreover, in our comparative analysis, we utilize the statistical accuracy (Acc) metric. Accuracy is the ratio of correctly classified instances, including both true positives and true negatives, to the total instances. We specifically measure the successful prediction of the election winner from each of the 10 election datasets based on the estimated popular vote.
Finally, we note that we did not take into account any dataset or other geo-political characteristics when choosing the parameters of the TA model, because our intention is to provide a model that can tackle, to the best possible extent, any electoral dataset, without significant restrictions or drawbacks. Thus, we used fixed values for the model parameters in all experiments, namely magnitude for both ETA and PTA, and damping for ETA and for PTA. We repeated measurements for increasing values of and (in the interval –5) and chose the single combination of and which determined the lowest average TPE over all ten datasets. Thus, our results summarize the average performance of TA based on the best trade-off combination of and . Individually, we can find better combinations of these parameters for each dataset, but not on average.
4. Results
An effective electoral forecasting approach must meet two criteria: (i) it must yield poll estimates for all candidates that closely match actual voting outcomes, and (ii) it must reliably predict the election winner. We evaluate the first stated property using the total positive error TPE, MAPE, and RMSE (refer to Equations (
10)–(
12)). A reduced TPE indicates better forecasting, while a MAPE and RMSE closer to 0 translate into greater forecast accuracy.
In
Table 2, the TPE metric is shown, while
Table 3 provides a summary of MAPE and RMSE between each forecasting technique and the ground-truth election outcomes. On average, across all datasets, the BPP exhibits a total positive error TPE of 7.65 points, while SA and CC show much higher errors of 17.36 and 16.09 points, respectively. AR is situated ‘in-between’ with a TPE of 9.81 points. Our methods present the smallest TPEs, with only 6.95 points for ETA and 6.92 points for PTA. Furthermore, the mean MAPE values are 11.17 for ETA, 11.03 for PTA, 12.91 for BPP (i.e., 17% higher than PTA), 15.48 for AR, 42.98 for SA, and 47.73 for CC. Regarding RMSE, ETA registers 1.89 points of TPE, PTA—1.88, BPP—2.12 (i.e., 13% higher than PTA), AR—2.93, SA–4.76, and CC—4.23.
Individual and mean results in
Table 2 and
Table 3 indicate that our TA methods outperform all other state-of-the-art forecasting methods. Compared to the BPP estimates, we measure average improvements in TPE of 0.70 points for ETA and 0.73 for PTA. Moreover, ETA and PTA show substantial improvements of 2.86 and 2.89 points, respectively, over AR estimations in terms of TPE.
Several statistical estimates are used to quantify the second stated property (ii). Initially, we calculate the
lead, defined as the point difference between the winner and the runner-up of each election. For reference, the electoral percentages of all the candidates can be found in
Table 1.
Table 4 outlines the differences between the leads estimated by each forecasting measure as the relative offsets from the actual lead (given in column Lead). For example, in dataset ARG’23, the measured lead is 6.70 points, resulting from subtracting the point difference C1–C3 found in
Table 1; similarly, the ETA offset equals −5.58 points, indicating that ETA predicted a lead difference of
points between the top two candidates. The average lead offsets for each statistical method, found in the last line of
Table 4, highlight the superior ability to predict the election winner of our TA methods. Specifically, ETA and PTA, again, exceed all other forecasting methods, with an average lead offset of −1.57, that is 60% lower (better) than the average lead offset of the BPP, respectively, 63% lower than the ARIMA average offset.
We further examine the extent to which the correct winning presidential candidate was identified in the ten election datasets. To assess prediction efficacy, we apply the statistical accuracy (Ac), as defined in
Section 3.3. An analysis of the predictions of each statistical method in the ten datasets shows that the reference statistical estimates (SA, CC) are less effective, with SA correctly anticipating only 5 out of 10 presidential outcomes (Ac = 0.5), and CC successfully predicting 6 out of 10 (Ac = 0.6). Moreover, the BPP predicted 7 out of 10 (Ac = 0.7) winners, AR predicted 6 out of 10 (Ac = 0.6), whereas both ETA and PTA predict 9 out of 10 winners (Ac = 0.9). Notably, no method predicted the correct results in the ROM’24 dataset, due to the local pollsters’ large margin of error.
Additionally,
Figure 3 emphasizes the strong forecasting capability of the TA methods when compared to competing methodologies. It is important to note that our TA techniques surpass BPP in terms of total positive error TPE in 6 out of 10 datasets (see
Figure 3A). Here, all vertical bars represent the offset between the BPP TPE and the TPE of the compared forecasting method; thus, positive values represent higher performance compared to the BPP. Regarding MAPE, ETA and PTA achieve better performance than BPP in 7 out of the 10 election datasets (see
Figure 3B). In terms of RMSE, the TA methods exceed the BPP performance in 7 out of 10 datasets. In general, both our TA methods demonstrate superior performance over the AR benchmark on 9/10 datasets.
Across all metrics tested, our TA methods clearly demonstrate superior forecasting accuracy in 6 out of 10 election datasets, and out of the
(datasets × other competing metrics) comparison scenarios, our TA methods are superior in 36/40 (90%) of comparisons.
Figure 3A,B visually depict the offset between the BPP prediction errors (TPE and MAPE) towards our TA methods’ prediction errors (i.e.,
). Thus, columns with values exceeding 0 indicate that our TA methods have better performance.
Figure 3C shows a heat map indicating whether our ETA or PTA methods exceed the BPP benchmark (highlighted in green, otherwise in orange), for each electoral dataset. We note that the lowest performance of our TA methods is achieved on the CND’21 dataset, with a relative TPE offset of −1.67–−1.48 compared to the BPP, whereas the highest performance is achieved on the ROM’24 dataset with a TPE offset of 4.55–4.59. The mean TPE offsets, averaged across all ten datasets, are 0.70 for ETA and 0.73 for PTA. Finally,
Figure 3D presents a heat map that highlights the correct prediction of the election winner (green cells) for each forecasting method in each of the ten datasets. For example, one can quickly assess that ETA missed only 1 in 10 winner predictions, while AR missed 4 in 10 predictions. Compared to the BPP, our TA methods manage a superior 90% winner prediction accuracy.
Real-Time Forecasting Performance Analysis
Next, our analysis is focused on assessing the applicability of our TA models during an ongoing pre-election timeframe. Thus, we employ the USA’20 and USA’24 datasets to evaluate prediction errors TPE, MAPE, and RMSE at various time points before the election date
. The USA’20 dataset includes 280 polls conducted over 651 days leading up to the elections, and the USA’24 dataset includes 143 polls conducted over 264 days. We opt to evaluate predictions and their respective performances at
.
Table 5 shows the total positive errors TPE across all six forecasting methods, with
Real Clear Politics being used as the BPP for both US elections. The absolute point differences
, between PTA and BPP, AR, and CC are given in the last three columns of
Table 5. All differences
indicate the superior performance of PTA over the other forecasting method. Additional detailed experimental findings, including MAPE and RMSE measurements, are available in
Appendix E.
Our analysis indicates that the forecast precision of statistical techniques (AR, SA, CC) is primarily influenced by the data volume, as their predictions gradually converge with the real electoral outcomes. In contrast, the more advanced forecasting methods rely not on accumulating data—especially as the elections near—but on the unpredictability of the political and social landscapes. For instance, BPP showcases the highest variability, with fluctuations from a low TPE = 5.40 in June 2024 to a significantly higher TPE = 7.40 by the end of July 2024, before dropping back to TPE = 5.60 in October 2024. Our TA methodologies exhibit greater temporal stability compared to BPP, showing resilience against the social fluctuations observed in BPP. In USA’20, we note that candidate C1 (Democratic) reached their lowest popularity in April–May 2020, while candidate C2 (Republican) gained proportional popularity; this volatility (around ) is reflected by the TA methods in assigning C2 an increased theoretical win probability. However, as the polls balanced out again mid-summer 2020, our predictions improved, aligning more closely with a balanced result akin to the actual popular vote of 2020. Moreover, in USA’24, there was increased volatility, as the Democratic candidate was changed mid-summer. Here, even though BPP is characterized by a higher TPE during the whole pre-electoral period, our ETA and PTA methods maintain lower values of TPE –. When looking at the TPE point differences , we notice that PTA is superior to the other competing methods most of the time before the election date.
Finally, in
Figure 4 we provide a visual representation of the evolution of the total positive error TPE during the pre-election period, up to 200 days before the election day
. We mark several points in time with diamond shapes (
), where the color of each shape reflects the forecasting method with the lowest TPE at that moment. In addition, the colored timeline at the bottom of each panel highlights the best predicting method at each moment in time.
We may easily conclude that PTA (violet) is, on average, the best forecasting method during the USA’20 pre-election period, while ETA (red) is, overall, the most performant method in the USA’24 pre-election period. In both
Figure 4A,B BPP displays the highest volatility in time and only manages to obtain a low TPE very close to the election date. Even though ETA and PTA are using the same pre-election poll data, they manage to maintain lower volatility and, overall, a much lower TPE in time. The Arima AR method is less volatile than BPP; however, its averagely high TPE makes it a less-performant forecasting method.
Overall, our findings indicate that, in contrast to leading pollsters who depend on MRP validated by social, economic, and political trends, our ETA and PTA methods enhance their forecasting performance by focusing exclusively on the temporal convergence of public opinion, demonstrating a significant level of estimation precision [
7,
52,
53].
5. Discussion and Conclusions
Our research distinguishes itself from prior electoral forecasting studies in multiple ways. Unlike straightforward statistical estimates such as cumulative counting (CC) and survey averaging (SA), our macro-scale temporal attenuation (TA) approach requires supplemental time-series data in the form of pre-election surveys specifying the date when each poll was released. In contrast to simply averaging the data (like SA), we input poll data into our computational framework, which is shaped by the temporal aspects of the opinion polls. Our two TA methods—namely ETA and PTA—both conceptualize opinion dynamics over time, portraying it as a function that rises when opinion poll data are introduced in the system and diminishes in its absence. We introduce an analogy with the “charging process of a capacitor” (i.e., injection of opinion) and its gradual discharge (i.e., a state of relaxation without new input). A capacitor charges along an exponential curve, much like how public opinion slowly shifts toward a particular viewpoint as more information is absorbed. Initially, it takes longer to influence the population, but as more people are exposed more often to the information, the consensus starts to form more rapidly. Similarly, the discharging rate can be influenced by external factors—the capacitor discharges based on the resistance in the circuit, and public opinion shifts in response to new information, events, or influential figures. The change is not instantaneous, but follows a similar exponential decay whose speed depends on the damping factor of our model.
In contrast to leading methods such as multilevel regression and post-stratification (MRP) [
11,
12,
13], recently integrated by top US pollsters, our TA does not require political, economic, or demographic context details associated with the election. TA is based on a fundamentally different process compared to other forecasting methods which rely on various socio-economic indicators. The only approach to incorporate additional indicators into the model would be to translate that data into weights for each published poll, but that would change the model significantly, and is not within the scope of this study. This characteristic provides a strategic edge, allowing TA to be employed in any global political region, provided there are sufficient, reliable public polling data. Consistent with the paper’s case study, which focuses on elections from nine different countries, we did not include any particular socio-economic information.
Our TA computational framework seeks to improve the predictability of the popular vote. Although we find specialized studies that address unique electoral systems, such as the US college-based system [
12,
13] and systems employing a direct popular vote, such as in France [
4], in contrast, the TA forecasting model has been developed for broader applications beyond specific political contexts, provided there are adequate and trustworthy pre-election polling data. Although this may seem to disadvantage the TA model within systems such as the US, TA still exhibits superior performance in practice. Furthermore, unlike other models that may require adaptation to different countries, the TA model operates efficiently without such modifications.
Being context-independent makes TA particularly robust in volatile political environments, where traditional forecasting models might struggle due to incomplete socio-economic data. Even in politically unstable regions, where external influences may disrupt voter sentiment, TA remains applicable as long as polling data are available and reflects evolving public opinion trends. Nevertheless, in regions where media access is heterogeneous, the dissemination of poll results may be asymmetric, leading to delayed or distorted opinion shifts among certain voter segments. Since TA does not directly incorporate media influence, its forecasts will reflect the polls themselves, meaning that any media-induced biases in polling data will be reflected in the predictions. However, the TA model’s reliance on temporal patterns rather than media-driven demographic adjustments makes it less vulnerable to the amplification of biases seen in models that require political or socio-economic inputs.
This research begins with the assumption that social networks amplify opinions derived from established public opinion polls because of their substantial media exposure. Recent studies on how adults inform themselves about political candidates and issues indicate that TV news leads with 73%, followed by news websites/apps at 45%, newspapers at 24%, and social media at 21%. These figures support our assumption, as poll-based opinions are disseminated across all these media channels [
54]. Moreover, despite the diversity of current media types, their collective reach is substantial, including during electoral periods, ensuring the reliability of polling accuracy [
55].
Alongside the comparison between TA and the best pollster predictions (BPPs) in each electoral system (see
Appendix D for more information), an ARIMA (AR) benchmark was also employed, calibrated using the same pre-election datasets. On average, AR yields reliable forecasts, surpassing those of CC and SA, yet falling short when compared to BPP and the two TA methodologies proposed by us. A comparison benchmark was further utilized for the real-time feasibility analysis in two electoral systems (USA’20 and USA’24), where once again, our TA methods emerged as superior electoral predictors, displaying reduced sensitivity to the social fluctuations revealed by pre-election surveys.
Our current study also presents certain limitations that we further examine in detail. First, we view social networks and social media as a widespread means of information dissemination, yet there exist individuals who are so-called ‘non-users’ [
56,
57]. We incorporated the simplification of considering published opinion polls to reach all voters in a system mainly due to challenges related to obtaining data on offline users and the trustworthiness of such data. According to multiple official statistics, approximately 67–75% of the global population interacts with social media. Despite this, we maintain that our model’s simplification is valid, as research on political attitudes reveals no statistically significant differences between social media users and non-users concerning political awareness, values, or behavior [
56].
Next, an additional simplification states that the modeled electoral system is relatively resistant to external manipulation, thus eliminating the need to account for data that are beyond our control (i.e., external influences). Based on the ‘liberal democracy index’, which was designed to assess the resilience of political systems (on a scale of 0 to 1), we measure values between 0.11 (Turkey) and 0.81 (France), with an average of 0.59, according to research conducted by the Swedish V-Dem institute [
48]. Therefore, we can consider electoral systems, such as the Argentinian, Brazilian, Canadian, French, and US to be robust in terms of external manipulation; the Indonesian, Polish, and Romanian systems are borderline, while the Turkish is considered less robust in terms of opinion manipulation. Nevertheless, we wanted to showcase how TA performs in various systems around the world and simply consider possible opinion manipulation to be reflected in the dynamics of the pre-election polls.
It is also worth noting that the presented results summarize the average performance of TA based on the best trade-off combination of the model parameters ( and ). Given the globally chosen model parameters, the results on some datasets will, naturally, be weaker than the competing pollsters.
Also, another limitation of our TA model is related to data dependency, namely the strong reliance on pre-election poll data. The accuracy of TA predictions depends—like that of most forecasting models—on the availability and quality of these polls, such that if polling data is sparse, or inconsistent, the model’s performance may degrade. Regional bias could prove to be a limitation in forecasting. For example, polling accuracy is considered to vary widely between established democracies and systems with weaker electoral transparency. Our model does not directly integrate with social media sentiment analysis, which could help the TA model capture opinion shifts triggered by viral events directly, rather than indirectly though shifting opinion polls.
Finally, in terms of handling unreliable or manipulated data, we consider that the results of any data-dependent model are only as good as the data they rely on. We used publicly available data that are considered to originate from reputable pollsters in each country. In other words, we did not have access to data that are knowingly manipulated. An interesting scenario is that of the recent Romanian presidential elections (first round), where no pollster was able to predict or capture the fact that a relatively unknown and unaccounted for candidate would win the first round of elections, surpassing all big parties. Thus, the registered errors for all pollsters and our TA are very high compared to the other datasets (see ROM’24 in
Table 2 and
Table 3). This situation proves that with low-quality opinion polls, forecasting methods will also provide weak results.
Future analyses should also prioritize examining current trends in voter polarization and the reliability of polls [
58]. However, our model for predicting election results is intentionally structured to remain largely independent of various societal and political contexts, such as the influence of polarized opinions.
Conclusions
With the surge in data availability and computational capabilities, contemporary election forecasting models are likely to develop in one of two directions: either micro-level systems utilizing detailed social media data, or macro-level systems using advanced data-science methods to analyze demographic and economic indicators. Nevertheless, our proposed methodology strikes a balance between these micro and macro perspectives, resulting in a straightforward, intuitive, and robust approach that can be applied to any pre-election dataset with temporal attributes. We consider this streamlined approach to be effective, because social influence frequently aligns with the concept of crowd wisdom [
52]. In essence, the collective evaluation of a large group (macro level) can surpass the precision of individual experts’ assessments (micro level) [
53]. This phenomenon becomes more pronounced when larger populations are considered [
52].
Although the TA methodology might seem simplified due to its reliance on a micro-scale opinion interaction model to predict macroscopic outcomes, our findings suggest that temporal awareness plays a more crucial role in election prediction than previously acknowledged. In our analysis, TA surpasses advanced election-forecasting methods in 6 of the 10 electoral datasets studied. TA achieves an average forecasting error of 6.92–6.95 points, while statistical methods yield errors ranging from 16.09–17.36 points, AR results in an average error of 9.81 points, and the BPP error lies at 7.65 points. This signifies an approximate 10–41% enhancement in the accuracy of popular vote predictions using our approach, compared to the BPP and, respectively, AR.
Examining the techniques employed by US pollsters, such as Real Clear Politics, FiveThirtyEight, and Understanding America Study, we have yet to identify any method based on temporal attenuation comparable with our methodology. Traditional statistics-based and data-science methodologies typically depend on particular social, economic, and political contexts for refining their forecasts. In contrast, our TA method functions independently of socio-economic contextual data, which we argue is a beneficial attribute. While achieving a flawless forecasting system is unlikely due to the complexities inherent to elections, our TA offers a unique and scientifically distinguishable alternative with demonstrated high efficacy.
Beyond electoral forecasting, the TA framework offers potential for broader applications. Its ability to model temporal dynamics and opinion evolution can extend to predicting outcomes in other domains where time-sensitive data play a critical role. Applications include natural disasters, by forecasting response patterns during emergencies, such as hurricanes or pandemics; epidemiology, by modeling the dissemination of health-related information or disease outbreaks; economic forecasting, by predicting market sentiment and economic trends by analyzing the temporal impact of external events on public opinion or consumer behavior. By adapting the TA methodology to other domains, researchers may improve forecasting accuracy and decision-making across various fields where time-sensitive patterns are crucial.