1. Introduction and Theoretical Framework: A Conceptual Vision of Disinformation/Misinformation and the Importance of Biases in Decision Making
The term ‘disinformation’ is understood as the action of inducing confusion in public opinion through the use of false information. The Royal Spanish Academy defines it as the action of giving intentionally manipulated information at the service of certain ends. To give insufficient information or omit it. However, these concepts in English contain important differences: misinformation refers to false or incorrect information, while the term disinformation refers to information deliberately given with the aim of creating confusion or fear among the population. In Spanish, we refer to both concepts as ‘disinformation’ To quote Southwell et al. [
1], misinformation is false or inaccurate information regardless of intentional authorship, much discussion about misinformation has focused on malicious acts with the intention of infecting social media platforms with false information. Regarding misinformation and its effects, they pose a risk to international peace; interfere with democratic decision making; endanger the well-being of the planet; and threaten public health [
2]. Thus, the concept of misinformation as a problem appears prominently in recent academic literature and public discourse [
1].
Thus, from semiotic and psychological perspectives, disinformation occurs when what is communicated does not fit with the current reality of an object. Political science and public relations approaches see disinformation as an application of manipulation techniques to the masses and public opinion, while a communication and information science perspective considers it a natural characteristic of news media and of a clearly oversaturated communicative ecosystem [
3].
The digital ecosystem of social communication, saturated by the rise of social media and the crisis within news media, is a fertile breeding ground for an increase in disinformation in our daily lives. The study of it must begin with a behavioural examination of the symbolic elites that may be able to manipulate collective thinking in favour of their own interests [
4]. This structure of restricted information, in part, moves away from being a clear expression of reality and becomes a prudent selection of a public agenda under the triangulation of three fundamental elements of communicative research: discourse, cognition, and society [
4], in which a discursive and semiotic analytical focus is required because the majority of disinformative content is spread via text, the spoken word, or image. This takes place in a society that is increasingly incapable of accessing privileged data for itself, which generates an avid demand for information via intermediaries as society strives to understand its own shared realities [
3].
Disinformation has developed exponentially with the COVID-19 pandemic, but its effects started to increased exponentially since 2018 [
5]. Three studies carried out in Spain during the pandemic and post-pandemic time periods produced important conclusions relating to disinformation: Salaverría et al. [
6] developed a study about the fake news generated in Spain during the first month of the State of Alarm, identified by Spanish verification platforms. The research group discovered that these fake news items were generally started on social media, mainly on WhatsApp. In addition according to fake news about science and technology, a great degree of false content there were found relating to political and governmental subjects [
7]. Another study, by Noain-Sánchez, centred on the disinformation generated in Latin America and Spain between 1st January and 1st June 2020 [
8]. The study was carried out by observing accredited fact-checking platforms. The conclusions indicate that most fake news was spread in text format, and that the most common medium of dissemination was social media platforms: Facebook in Latin America and WhatsApp in Spain. Disinformation in this case not only affected health but also politics. The study by [
9] was carried out a year after the start of the pandemic, when the news was making reference to vaccines and the vaccination process. Our team found that in this research too the diffusion of fake news focused on the political debate. In this case, the dissemination was carried out via Twitter and WhatsApp [
9].
Disinformation can lead to so-called systematic error, which occurs when citizens select or favour certain responses over others. This is when cognitive biases appear: shortcuts taken by the brain when it processes information. These shortcuts can hinder decision making and generate irrational and incorrect behaviour (Kahneman, 2011) [
10]. In his work about the cognitive biases generated in the COVID-19 pandemic, researcher Castro Prieto posed an interesting question: Why do people sometimes show a tendency to take steps that are not beneficial to them or to society? Faced with uncertainty, our brain resorts more to these biases, generating quick and impulsive responses to decisions that require evaluation. Bias analysis is connected to guidelines of social behaviour that are directly related to decision making. Decisions are not always taken in a rational way [
11], and, in this sense, decision making can lead to erroneous situations. The minute that a relationship is produced between the news, bias, and behaviour, this relationship should be studied and analysed from scientific and academic perspectives.
The research draws on international literature sources to establish a list of biases to be taken into account in the process. Focussing on the work of Kahneman and Tverski, various research teams have centred their investigations on a series of biases that are worth considering. It comes as no surprise that these authors conclude that human rationality is affected by behavioural or cognitive heuristic biases. Similarly, the author Cerezo, aiming to detect biases produced during the pandemic, gives a brief summary of these types of cognitive bias: one of them is loss aversion, which forms part of prospective theory [
12] and explains how humans give more importance to a loss than a gain; in the case of a pandemic, a population will think more about what it is losing with the crisis than what it stands to gain by following proposed health measures. According to the author, when this happens it is those responsible for public health policy who should mitigate this overvaluation of losses. Another suggested bias is the so-called carry-over effect, also known as the bandwagon effect: this is doing what the majority are doing with no concern for the correction or suitability of the action with regard to the circumstances. One example is how, in the first stages of the pandemic, people bought more food than they needed. The way to deal with this bias is by appealing to rationality. Another bias may come from the little value given to long-term consequences. This bias violates the motivation to carry out certain actions when the threat they pose is not observed in the short term. This is also known as present bias, and shows that we prefer to enjoy the present than think about future consequences. The authorities should promote hopeful messages to counteract this bias, showing the short-term results of proposed measures. Optimism bias is based on our belief that it is unlikely that we will experience a negative event. In terms of illness, this bias makes us see ourselves as less vulnerable than other collectives. The solution to this is thought to be through clear messages from health authorities and news media—emotional messages based on sentiments (i.e., empathy, fear, and individual responsibility). Availability bias, which caught Freudenburg’s attention (1993) following the research of [
12], overestimates the most accessible or closest information and can be mitigated by reinforcing the idea that citizens are responsible and obey rules. Good image bias translates to the continuity of habitual dynamics, paying no attention to advice involving a change of attitude or behaviour. In some way, those who develop this bias think that not doing what is recommended shows them in a more favourable light since it makes them more attractive than the rest. It can be combatted by appealing to and involving leaders in the promotion of recommended behaviour. Confirmation bias evaluates information from a position of existing ideas and beliefs, which means we always find reassurance for our own arguments. Thus, the search for information centres on looking for news that supports our own points of view; only supporting information that is of interest. This bias is related to post-truth theory and fake news or conspiracy theories. In this case, it is necessary that both health authorities and news media, along with opinion leaders, neutralise them (Cerezo, 2020) [
11]. Authors such as Helena Matute (2019) [
13] suggest other types of bias, such as that of familiarity, which leads the brain to trust what it knows and mistrust what is new, or that of the illusion of causality.
Table 1 summarises these biases, quoting those authors who have had the strongest relationships with them, highlighting Kahneman and Terverski [
12], among others like Leibenstein [
14], Freudenburg [
15], Lewicki [
16] and Nickerson [
17], the pioneer being Leibenstein in 1950 with the carry over bias.
Disinformation campaigns do not usually limit themselves to the dissemination of fake news but aim to create a malicious narrative, largely based on the idea of post-truth. It has become the scourge of our times, infringing deontological codes and undermining democracies. Therefore, it presents a challenge for the information professionals fighting against it, but also an opportunity since it is necessary to confront it with the veracity of facts, rigour, and ethics.
The Oxford Dictionary defines post-truth as the consequence of a political climate in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief [
18]. Thus, post-truth arises when what is important is not the truth but perception, and so the conflict between your facts and mine is resolved without worrying about which of these is based on truth [
19,
20]. The problem is that inaccurate information, rumours, and conspiracy theories can seriously impair people’s ability to make health, environmental, political, social, and economic decisions that are crucial to their lives [
21].
News media, along with politics, are also responsible to some extent for the construction of post-truth via the propagation of fake news.
The endorsement of disinformation has negative consequences for public health since there is a tendency to believe that the official information exaggerates the risks of COVID-19 [
22]. The conclusion of this work highlights the effect that the disinformation from conservative news media had on public health; an example is the tendency of consumers of these media to think that politicians were exaggerating the importance of COVID-19 [
22], a misconception that had serious consequences. At the present time, it seems that post-modernity and post-truth proclaim an end to the desire for the search for truth; what is sought nowadays is the legitimisation of ideology. The truth or lie behind what is alleged is no longer important; what matters is the willingness to give it the category of truth [
23]. The concept of fake news is prior to that of post-truth, but the two go hand-in-hand.
2. Methodology
The methodology for the bibliometric analysis based on the semantic content of abstracts is articulated around several fundamental steps (i.e.,
Figure 1), aimed at exploring and visualising research trends as well as thematic patterns in the academic literature:
For this purpose, data collection was carried out using three sources: Scopus, Web of Science (WoS), and Dimensions, to ensure a broad and diversified dataset, in addition to creating quantitative and qualitative variables that would then be used in the subsequent analyses. The queries performed in the search were ‘health AND disinformation AND bias’ and ‘health AND misinformation AND bias’. The first query resulted in a total of 102 publications, while the second resulted in 680 publications. No filters of any kind were used in the queries. The research areas of the publications were acquired. Therefore, the initial corpus consists of 782 publications from the three data sources.
The next step was the standardisation of categories across databases to unify the analysis. Standards in bibliometric research understand that the ideal practice is to use a single type of database where indicators are already standardised [
24]. In our case, this was not possible due to the high atomisation and specialisation of the databases; however, no relevant metadata gaps were found, and differences in metadata were mitigated by selecting higher-quality metadata, such as the number of citations of a publication (cited or cited count).
Data cleaning was carried out in different phases: Firstly, we eliminated those publications that did not contain an abstract, those publications whose typology corresponded to preprint, letter, note, or survey, and duplicate publications, considering the title, abstract, and DOI, after standardising spelling and punctuation marks. In the event that the same publication was found in two or more data sources, priority was given to maintaining the highest number of citations. On the other hand, the research areas were standardised in their nomenclature, assuming the Scopus classification to be valid (i.e., the Dimensions and WoS areas were translated into the Scopus ones), as it is the most generic and inclusive. The final dataset was reduced to 374 publications.
Next, the semantic analysis of the abstracts was carried out using the LaBse model (i.e., language-agnostic BERT sentence embedder [
25]) for sentence embedding, a deep learning-based natural language processing (NLP) technique that converts sentences into vectors based on prior training. These vectors represent the semantic features of sentences in multidimensional space (i.e., matrices). After applying this process of vectorising the sentences in the abstracts, those abstracts with similar meanings are placed close to each other in this multidimensional vector space, while abstracts with different meanings were placed further away.
In order to minimise the loss of semantic information in the vectors, the distances between them were calculated from the cosine similarity, a calculation commonly used in the context of sentence embeddings because of its ability to deal with multidimensional data [
26]. Then, a graph of semantically similar abstracts was constructed based on the highest values obtained in the previous calculation: those with a cosine similarity equal to or greater than 0.9. The threshold was calculated in conjunction with modularity (see below) to obtain a graph partition that would allow for the observing of relevant differences between communities without decomposing the graph into isolated subgraphs. With this process, a mathematical object was created to analyse the different narratives in the corpus of abstracts. Additionally, cosine similarities greater than 0.99 were used to identify and eliminate previously undetected duplicate publications from the network. Finally, the abstract network consisted of 366 nodes and 4394 edges.
The graph was transformed into an undirected network with the Python library NetworkX [
27]. The modular structure of the network was then identified with the Louvain algorithm [
28]. This is a procedure that identifies communities in networks based on modularity optimisation, which is a measure that quantifies the structure of modules, clusters, or communities within a graph [
29]. In this case, the modularity value achieved was 0.203: a low value that places us before a set of poorly differentiated narratives (i.e., which is congruent with a methodology such as the one used to study a very specific and delimited scientific field), but which derives from the methodological decision of the 0.9 threshold. At all times we will bear in mind that the communities we are analysing have many more elements in common than distinctive ones, which also accounts for the particularities of the empirical object we are dealing with. Next, the weighted degree of the nodes of the graph was calculated (i.e., the number of neighbours of each node multiplied by the power of the links). All of these data were exported and added to the initial dataset.
With the aim of deepening the semantic analysis of the communities created with the previously calculated modularity, a bigram calculation was carried out with the abstracts of the publications to capture contextual relationships between words. These are pairs of consecutive elements taken from a sequence of elements, such as the co-occurrence of two words in a text, which are calculated after having removed the stop words (i.e., function words) from the texts. Two different bigram calculations were carried out: one on the abstracts—on the one hand, a bigram for each community, and, on the other hand, two bigrams for all communities based on pre-and post-2020 time spans—and the other focusing on specific terms of interest, specifically on the word bias. Thus, as we will see below, each community has a bigram cloud related to the contexts of the abstracts of the publications of that community, and then two general bigrams specific to the periods before and after 2020 are analysed. Finally, two separate bigram analyses were carried out to examine the bigrams created based on the word bias.
To visualise the data, Gephi was used to synthesise the graph using Force Atlas 2, a brute force algorithm that moves linked nodes closer together and away from unlinked ones [
30]. The rest of the visualisations were made with Python in the case of the bigram clouds for each cluster—both those describing the co-occurrences in the abstracts of the publications of each cluster and the bigram clouds containing the word bias in one of the two n-grams—and PowerBI for the rest of the visualisations.
The exploration of results was carried out in two stages: Firstly, bibliographic production and impact were analysed on the basis of general indicators. Secondly, a detailed analysis of each community in the graph was carried out. The methodology includes a detailed examination of topics within specific research communities based on the identified communities, as well as a collective analysis of topics outside these communities (i.e., communities isolated from the graph; not exceeding the threshold of 1% of nodes).
4. Discussion and Conclusions
Having observed the results obtained in the search and analysis process, this research shows the existence of academic articles or papers that have dealt with biases and their relationship with how the world of academia has treated disinformation. Thus, in terms of community analysis, the results show a considerable academic dedication towards the existence of biases and disinformation. In all these communities we have observed a relationship between production, with the field of medicine as a general theme, and social media, with this connection always being tied to other subject matter, such as an aversion to vaccines in Community 10; disinformation about COVID-19 on social media in Community 5; COVID-19 and conspiracy theories in Community 6; and material for the dissemination of health-related subjects on YouTube, as well as the disinformation that exists about them.
This community analysis reveals a common factor in all of the communities analysed—that of cognitive bias; however, it must be taken into account that, according to various authors, there are significant relationships between different types of bias, and this helps people to protect the image they have of themselves, leading them to analyse information in a particular and partial way [
38].
Precision about the type of cognitive biases treated in academic publications can be found in the second synopsis of the Results section—the bigrams containing bias per each community. Thus, Community 2 deals with confirmation bias, along with other cognitive biases, such as publication and commercial. The first analyses and interprets information from existing ideas and beliefs, giving preference to one’s own arguments. In this same analysis, Community 10 refers to risk bias and also publication bias, focussing in this case on those papers referring to the COVID-19 pandemic and which were accepted or rejected depending on the results produced.
Other biases were taken into account in the preparation of this work, such as the bias of optimistic self-perception, which appears in Community 5 and refers to the tendency of one to think that bad things will not happen to them. This bias appears in the lifting of information as optimism bias. Additionally, prejudices and errors of reasoning appear in the analysis—they are cognitive and unconscious biases, which appear in Community 6. Similarly, commercial bias appears in Community 0, found in a relationship between commercial interests and information bias.
Therefore, it can be seen how academic literature has treated the analysis of cognitive biases present in the relevant published information. It is the analysis of information generated at one particular time, in one particular context, and in one particular circumstance that finds common elements in the search for biases. A large number of those that had been taken into account in this work appeared in the analysis. Others that had not been taken into account refer to two elements that should generate special interest: commercial bias and publication bias. The first favours an informative focus not based on objectivity but on certain economic market interests. Publication bias is also particularly important since it focuses on information—in this case academic—depending on the results of research. This is highly damaging for the practice of academic research, since it causes it to lose objectivity the minute publishing companies have a preference for some results over others, which clearly shows a lack of objectivity in a sector that is based precisely on objectivity. Special attention should also be paid to this bias and its existence since it tends to create a loop, a spiral in the world of science, a sector in which every new academic production is based on previous studies. Are we facing a spiral of disinformation generated by the publishers of certain scientific journals? It is important to be aware of this subject in order to detect and avoid the so-called ‘scientific fraud’ that has been reported in the press for years. What is more, they are biases that have not been generated by the psychology of citizens in the form of shortcuts to enable decision making—over and above the particular consequences that these might have—but by communication companies that have an interest in publication depending on what type of results are produced about health information.
It is also important to highlight the intentions of the research teams whose papers have been analysed in the defence of communication for health education, centred on the predominance of social media in the analysis that has revolutionised the way of accessing information about medicine and health [
39]. In short, the importance of information and knowledge in the development of processes of political—or even economic—democratisation [
40] indicates that communication is the central axis of this development. But this axis should not be limited; health education plays an important role in the development of healthy, capable, free, and empowered societies.
Beyond the comments made on academic production in the mentioned fields, our research shows that there is a relationship between biases and disinformation when it comes to people’s decision-making processes, including for health-related decisions. It also confirms that the existence of cognitive biases generates effects such as those mentioned in this article: interference in decision making, danger to the well-being of the planet, and a threat to public health. At the same time, it underlines the WHO’s position by referring to an infodemic and the impossibility of addressing health problems—among others—without first solving the issue of disinformation.
This study has been conducted to identify the biases used in the era of misinformation related to health topics. Misinformation is not simply bad or false information, but rather selective information that circulates even among isolated and disconnected groups. Disinformation is not simply bad or false information directed at communities of users, but also selective information circulating among isolated and disconnected groups. One of the important contributions of this study is that it has worked with a novel and innovative methodology, through a methodology with a bibliometric approach that also combines social network and discourse analyses of abstracts of international academic literature. The great contribution is that it sheds light on the relationship between cognitive biases and health, exploring topics of interest to society such as vaccine hesitancy and the COVID-19 pandemic. Future research will focus on understanding and analyzing what makes communities and individuals less susceptible to manipulation. We will rely on data that can help mitigate mass manipulation.
Beyond the search for this relationship between biases, it is clear that the academic literature has dedicated space to articles or papers that have observed how these subjects have developed during times of pandemic.