Next Article in Journal
Global Media Sentiments on the Rohingya Crisis: A Comparative Analysis of News Articles from Ten Countries
Next Article in Special Issue
Investigating the Role of Artificial Intelligence to Measure Consumer Efficiency: The Use of Strategic Communication and Personalized Media Content
Previous Article in Journal
Predictors of Mobile News Consumption through News Applications (Apps): The Impacts of Audience Characteristics, Media Usage, and Motivations
Previous Article in Special Issue
Rethinking the Relation between Media and Their Audience: The Discursive Construction of the Risk of Artificial Intelligence in the Press of Belgium, France, Portugal, and Spain
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Effects of Assumed AI vs. Human Authorship on the Perception of a GPT-Generated Text

by
Angelica Lermann Henestrosa
1,* and
Joachim Kimmerle
1,2
1
Knowledge Construction Lab, Leibniz-Institut für Wissensmedien, 72076 Tübingen, Germany
2
Department of Psychology, Faculty of Science, Eberhard Karls University, 72076 Tübingen, Germany
*
Author to whom correspondence should be addressed.
Journal. Media 2024, 5(3), 1085-1097; https://doi.org/10.3390/journalmedia5030069
Submission received: 4 July 2024 / Revised: 6 August 2024 / Accepted: 15 August 2024 / Published: 20 August 2024

Abstract

:
Artificial Intelligence (AI) has demonstrated its ability to undertake writing tasks, including automated journalism. Prior studies suggest no differences between human and AI authors regarding perceived message credibility. However, research on people’s perceptions of AI authorship on complex topics is lacking. In a between-groups experiment (N = 734), we examined the effect of labeled authorship on credibility perceptions of a GPT-written science journalism article. The results of an equivalence test showed that labeling a text as AI-written vs. human-written reduced perceived message credibility (d = 0.36). Moreover, AI authorship decreased perceived source credibility (d = 0.24), anthropomorphism (d = 0.67), and intelligence (d = 0.41). The findings are discussed against the backdrop of a growing availability of AI-generated content and a greater awareness of AI authorship.

1. Introduction

Automated text generation (ATG) has been garnering significant attention since it became freely available to everyone with internet access due to the release of ChatGPT (Chat-Generative pre-trained transformer). Although ATG has been used for over a decade in areas where structured and machine-readable data was available (e.g., automated journalism), it has long been a niche topic that has received less attention than other developments in artificial intelligence (AI). The availability of large datasets, the increased computational power, advancements in deep learning, and the introduction of the transformer architecture in 2017 (Vaswani et al. 2017), which is the backbone of various models (e.g., GPT from OpenAI or BERT from Google), led to a development boost in natural language generation (NLG) and large language models (LLM). Language, especially written language, is no longer the exclusive preserve of humans. On top of that, due to their training data, LLMs can now write on any topic imaginable. Whether they are also competent in terms of content is a different matter.
Before the emergence of LLMs, ATG had already been an established method in short news reporting, for example. By using structured, machine-readable data, AI-based algorithms can convert the raw data of weather parameters of a city into a consistent verbal weather report, for instance, no more distinguishable from a human-written text (Brown et al. 2020; Köbis and Mossink 2021). Since the release of ChatGPT in November 2022, the possibilities for text generation, even outside journalism, have become apparent. With today’s LLMs, tools are at hand that can take away the effort of writing on any topic, only one prompt away. Besides their apparent pitfalls and limitations, they open new possibilities for knowledge access and science communication. For example, scientific information could be made more understandable and approachable by addressing specific target groups, explaining facts to the heart of the matter, summarizing, and breaking down complex information.
Due to the limited capacity of and the scarce attention to ATG before the release of ChatGPT, research is lagging regarding readers’ perceptions of this specific form of AI and the perceptions of AI authorship as a novel source cue, especially regarding topics other than news reports. However, studies from the field of automated journalism suggest that at least these texts do not differ from human-written texts in their perceived credibility (Graefe and Bohlken 2020; Jang et al. 2021; Tandoc et al. 2020; Wölker and Powell 2021) or that machine authorship has only a small negative effect on credibility perceptions (Wang and Huang 2024). In a meta-analysis, Graefe and Bohlken (2020) found no difference in credibility perceptions between human and AI authorship across several number- and fact-based topics, such as sports reports or election polling. However, there might be meaningful differences between fully automated news generation, based on structured data, which Graefe and Bohlken took into account, and the support provided by generative AI tools, which has only recently emerged. Besides their finding that assuming an AI vs. a human as an author of a text decreased credibility perceptions in socio-political and environmental topics, Wang and Huang (2024) found no effect on news evaluations in their meta-analysis. However, their findings also revealed a moderating role of the actual source on news evaluation, which could have been due to the quality of the AI texts at the time and might be now less of a problem. On the other hand, Proksch et al. (2024), for example, investigated the effects of labeled authorship on moral topics and found lower author competence and content quality ratings for AI authorship. Also, Böhm et al. (2023) found lower competence ratings for AI-generated content regarding societal and personal challenges simply when the labeled source was AI. These findings indicate a role for the task context and the topic chosen.
While automated journalism research has focused on short news reporting, Lermann Henestrosa et al. (2023) extended this comparison to the topic of science communication. The authors found no differences in message credibility and trustworthiness between humans and AI authors. Still, participants in their study differentiated between the alleged authors regarding perceived anthropomorphism and intelligence, for which the human author was rated significantly higher than the AI author (Lermann Henestrosa et al. 2023). As former research often used only allegedly AI-written texts and was limited to the textual possibilities available at the time, research on actual AI-written content on broader topics is steadily gaining momentum. Moreover, due to small sample sizes in prior studies, small effects could hardly be found and might have led to inconsistent results or null effects.
Therefore, the present study aims to investigate the influence of labeled authorship on the perceived credibility of the message and the credibility of the source of an AI-written science journalistic article. More specifically, our research question was whether there is no difference in the perception of labeled AI authorship vs. human authorship on an actually AI-written scientific article as previous findings suggest (Graefe and Bohlken 2020; Lermann Henestrosa et al. 2023; Wang and Huang 2024). This extends previous research by using an actual AI-written text and expanding the topic to the actual possibilities of LLMs. Using equivalence testing, we responded to the small or non-significant effects of labeled authorship found in prior research. Therefore, we stated the following hypothesis preregistered on https://aspredicted.org/6BP_355 (accessed on 14 August 2024):
H1: 
The mean difference in perceived message credibility scores in the two conditions (AI author vs. human author) will be equivalent: the article allegedly written by a human author will be perceived as statistically equally credible as the article allegedly written by an AI.
Furthermore, previous findings by Lermann Henestrosa et al. (2023) suggest significant differences in authorship perceptions—despite constant material—between alleged AI vs. human authorship, with the AI being perceived as less human-like and less intelligent than the human author. Therefore, we stated the following two hypotheses concerning the perceived anthropomorphism and intelligence of the respective authors:
H2: 
There will be a main effect of the factor authorship on the perceived anthropomorphism of the author: the alleged human author will be perceived as more anthropomorphic than the alleged AI.
H3: 
There will be a main effect of the factor authorship on the perceived intelligence of the author: the alleged human author will be perceived as more intelligent than the alleged AI.
In addition to previous studies, we included the perceived source credibility to directly query the credibility of the alleged author with the following open research question:
RQ1: Is there any effect of labeled authorship on participants’ perceived source credibility?
To control for possible effects of attitude toward the text topic, we included the participants’ prior attitudes exploratively as covariates.

2. Methods

The study was an online experiment with a one-factorial between-groups design (factor labeled authorship: AI author vs. human author). Participants were asked to read a science journalism article about biodiversity, specifically the spread of wolves in Germany, and to rate it on different scales. The exact same text was presented in both conditions, with the only difference being the labeled authorship that was introduced before. The article was created by using the autoregressive language model GPT-3 (OpenAI)—a predecessor model of the model underlying ChatGPT—according to the following procedure: We used the Davinci engine of the OpenAI playground and determined the following parameters: number of tokens = 100, temperature = 0.8, frequency = 0.0, and penalty = 0.0. As prompts, we used five content structuring sentences adapted from the material of Lermann Henestrosa et al. (2023), which were first translated into English and then used to generate five trials per prompt (see Appendix G). From the generated output, a fitting GPT continuation of each of the five trials was selected to gain a complete paragraph, that is, the continuation that most closely matched the tone of a scientific article in terms of content. Thirty-three words had to be deleted because the autocompletion stopped at 100 tokens, resulting in incomplete sentences at the end of the paragraphs (e.g., “In Thuringia” was deleted from the selected autocompletion for the second paragraph). Afterward, the generated text was translated into German and consisted of 353 words. Besides adding three missing punctuation marks, the paragraphs were not edited.

2.1. Sample

A power analysis for a small effect size of d = 0.21, an alpha-error probability of 0.05, and a power of 0.80 revealed a total sample size of N = 714 participants needed for the intended equivalence test for H1. Data from a random and fully anonymized sample were collected via the recruiting platform Prolific. The only prerequisites were that participants had to speak German and be over 18 years old. Of 800 participants in the online experiment, 66 were excluded from the analysis due to preregistered exclusion criteria. The final sample consisted of N = 734 participants with a mean age of 29.04 years old (SD = 9.65). Participation took about 10 min., and participants were compensated with 1.25 GBP. Table 1 shows the absolute and relative distributions for gender and education. For the educational level classification according to school type, see Oehler et al. (2024).

2.2. Measures and Procedure

Participants were asked about their demographics (age, gender, educational level) after filling in the informed consent form in the online experiment. As they were told to read a science journalism article about a biodiversity topic, a short introduction to science communication followed, briefly explaining what it is and who practices it. Afterward, their prior attitude toward wolves was measured by five items based on a scale (Treves et al. 2013, 5-point Likert scale, e.g., “The spread of wolves in Germany is a positive trend”). Before reading an article, we randomly assigned participants to one of the two conditions, in which the text was either labeled as being written by a human author or by an AI. To make the manipulation clear and prevent participants from thinking that the AI would be taking its information uncontrollably from unclear sources, we briefly explained that the AI could analyze large amounts of data and produce text on this basis without human intervention.
Moreover, as the study was conducted before the launch of ChatGPT, we explained that the AI supposedly took the information for the article by using three reliable sources, which were listed (e.g., the Federal Statistical Office of Germany). The same information was provided for the alleged journalist who was also briefly introduced. The cover story claimed the article was published in 2020 in a German newspaper. For the authorship manipulation, see Appendix A, Appendix B, Appendix C and Appendix D. For the original article, see Appendix E and Appendix F. After reading the text, participants were asked how neutral they perceived the tone of the author to be (bipolar 5-point scale from “absolutely neutral” to “absolutely evaluative”) and answered two manipulation check items concerning the author and content of the article.
Dependent variables were the perceived message credibility of the text that was measured using the Message Credibility scale (Appelman and Sundar 2016; Sundar 1999) (19 items measured on a 5-point Likert scale). The sample items of this scale are “fair”, “accurate”, or “authentic”. Exploratively, we also measured source credibility on five bipolar items, such as “unbiased—biased” or “not trustworthy—trustworthy” (Flanagin and Metzger 2000) (6-point scale). Furthermore, we asked participants to rate the perceived anthropomorphism (e.g., “machine-like—human-like”) and perceived intelligence (e.g., “ignorant—knowledgeable”) of the author with five items each (Bartneck et al. 2009) (bipolar 5-point scale). Participants’ attitude toward wolves was then measured again. Finally, the behavioral intention to recommend the article to friends or family and to read such articles again was measured on two single items on a 5-point Likert scale. Before being debriefed, participants could indicate who they thought actually wrote the article by choosing between “I believe the text presented was actually written by an AI”, “I believe the text presented was actually written by a human”, or “I am not sure” (true author).

3. Results

3.1. Main Analyses

The means and standard deviations of all measures by labeled authorship can be seen in Table 2. The explorative analysis of the perceived tone of the author revealed no significant difference between the two conditions, Welch-t(726.05) = 0.30, p = 0.762. Participants perceived the author’s tone to be relatively neutral.
To examine H1, an equivalence test with equivalence bounds of ± the smallest effect size of interest (SESOI) of Cohen’s d = 0.21 was conducted (1). For a detailed description of equivalence testing, see Lakens et al. (2018). The SESOI was determined based on the raw mean difference of 0.1 on a 5-point scale and the observed pooled variance of sp2 = 0.23 from a previous study.
d = μ 1 μ 2 s p 2 = 0.1 0.48 = 0.21
The equivalence test (TOST [= two one-sided t-tests] procedure) concerning perceived message credibility was non-significant, t(732) = 2.03, p = 0.978, and the observed mean difference of 0.2 fell out of the predefined equivalence bounds. Instead, the null hypothesis significance test (NHST) revealed an effect of labeled authorship on message credibility, t(732) = 4.87, p < 0.001, d = 0.36. The text allegedly written by a human author was perceived as more credible than the same text allegedly written by an AI. The results of the TOST procedure are shown in Figure 1. Figure 2 illustrates the participants’ distribution of responses regarding message credibility in each condition.
To examine H2 and H3, two Welch’s t-tests were conducted due to missing homogeneity of variance. Concerning perceived anthropomorphism, the results revealed a significant difference between the conditions, Welch-t(726.44) = 9.15, p < 0.001, d = 0.67. As expected, and observed in previous studies, participants rated the human author as more anthropomorphic than the AI author. The same pattern resulted regarding perceived intelligence, Welch-t(731.45) = 5.57, p < 0.001, d = 0.41. The human author was perceived as more intelligent than the AI author (see Figure 2).
Regarding the open research question of whether the labeled authorship influenced perceived source credibility, we found a significant difference between the conditions, Welch-t(731.39) = 3.25, p = 0.001, d = 0.24. Participants rated the human author as more credible than the AI author. In addition, source and message credibility were highly correlated, with r = 0.82.
Participants also indicated they would recommend the text by the alleged human author more strongly than the one labeled to be written by an AI, Welch-t(731.41) = 2.70, p = 0.007, d = 0.20. Moreover, respondents’ intention to read such an article again was higher in the human author condition than in the AI condition, Welch-t(731.82) = 2.64, p = 0.009, d = 0.19. These two items were also highly correlated, with r = 0.73
Regarding the true author item, 52.65% of the participants in the human author condition believed that a human indeed wrote the text, 21.73% indicated an AI actually wrote it, and 25.63% were unsure. In the AI author condition, 39.47% were convinced that the AI was the actual author, 32.27% thought a human could be the real author, and 28.27% were not sure.

3.2. Further Analyses

Finally, we aimed to explore the prior attitudes toward wolves as a covariate in the analysis. However, we refrained from calculating an ANCOVA because the homogeneity of regression slope was violated concerning message credibility, as the interaction term was significant, F(1, 730) = 9.33, p < 0.001, ηp2 = 0.01. Figure 3 depicts the interaction effect between participants’ prior attitudes toward wolves and labeled authorship, suggesting a positive relationship between participants’ initial attitude and perceived message credibility in the human author condition but not in the AI author condition.
Moreover, an ANOVA with repeated measures regarding participants’ attitudes revealed a main effect of treatment, F(1, 732) = 10.99, p < 0.001, ηp2 = 0.02, and a significant interaction effect between treatment and labeled authorship, F(1, 732) = 4.58, p = 0.033, ηp2 = 0.01. Participants’ attitudes toward wolves turned out to be more positive after reading the article, especially when a human author had allegedly written the article.

4. Discussion

The study presented here aimed to extend the existing literature on automated journalism to more complex content about science communication. Specifically, we aimed at moving forward research on credibility perceptions of AI authorship on short news reports to a topic and a writing style that better reflects current AI capabilities. Moreover, as one of the first studies in this area, we tested the previously found null effects of human vs. AI authorship on credibility perceptions by using equivalence testing. We defined the smallest effect size of interest and collected a sufficiently large sample size to detect even small effects. Thus, the sufficiently large sample size and the custom-fit methodology to detect even minor differences revealed a small but substantial difference in both perceived message credibility and source credibility between an alleged human and an AI author. The findings show that introducing an AI as the author of a text led to less perceived credibility of the author and the article—even though it was the identical text in both conditions and actually written by an LLM.
In addition, the participants in this experiment were aware of the specific type of author they were dealing with. Besides the passed manipulation check regarding the authorship of all participants included in the analysis, the differences in perceived anthropomorphism and perceived intelligence of the respective authors speak to the differentiating perceptions of the readers: the AI author was rated as less realistic, human-like, intelligent, and competent than the human author. Furthermore, participants intended to recommend the provided article less, and they were less willing to reread such articles in the future when an AI allegedly wrote it. Investigating the factors that led to this evaluation and exploring whether credibility can be manipulated for both types of authors via perceived intelligence and perceived anthropomorphism is a task for future research.
This study is oriented toward current journalistic practice as the text was framed as a science journalism article published in a newspaper. Our experimental setting set a realistic and forward-looking scenario as ATG technology has already been used in journalism for several years and will indeed be deployed much more in the future. Current developments around generative AI, especially around ATG, point to a trend toward the increasing use of AI authorship and its participation in the journalistic process.
Moreover, the provision of scientific information by generative AI reflects actual potential uses of tools such as ChatGPT, also on the part of laypersons. With the increased use of generative AI, its use for information search and information gathering will likely increase. People might trust information provided by ChatGPT in a similar way as that obtained by Google or Wikipedia (Jung et al. 2024). In particular, the more subtle presentation of information in continuous text, when the primary aim may not have been to obtain facts, is still to be investigated and should be viewed critically.
Considering the relatively high credibility ratings for both authors in this experiment, the practical implications of an AI-authored article being perceived as slightly less credible than a human-authored one are unclear. Our study was conducted in January 2022, when peoples’ experience with and attitude toward ATG and NLG might not have been highly developed due to missing labeling requirements. In addition, there is evidence that people had no clear concepts regarding ATG and that this has not changed considerably since the release of ChatGPT (Bodani et al. 2023; Lermann Henestrosa and Kimmerle 2024). Given that it was a novel experience for participants to see an AI write about a scientific topic to this extent, the findings are intriguing and speak for a basic leap of faith.
Of course, this study provided very transparent conditions from the readers’ perspective by declaring the authorship and the authors’ alleged sources, which were reputable. It remains uncertain how transparently media organizations will handle AI in the text-production loop in the future. An extensive societal debate, reflecting the potential desire of readers for clear regulations, is needed to address this concern. For completeness, future studies should also ask for people’s assessments of the sources, if provided, and consider today’s LLMs’ weaknesses in providing reliable sources. In addition, a distinction should be made between LLMs that cannot specify sources due to their basic structure and such models that can do so and would therefore be more appropriate for journalism. However, due to the lack of experience with automatically written journalism, the authorship introduction was necessary for our experiment to successfully manipulate the respective authorship and effectively compare human vs. AI journalism. Therefore, the results of the final question of who the participants ultimately thought was the real author should be interpreted with caution. While in the human condition, they bordered on guessing probability and the proportion of people who believed that an AI had written the text was relatively high given the lack of experience at this time. As technological development is progressing rapidly and debates about the strengths and limitations of ATG are constantly shaping public opinion, this picture might have already changed.
Our exploratory finding was that the credibility of the supposedly human-written article was positively associated with prior attitudes toward wolves, whereas this effect did not extend to the AI author. Additionally, the observation that attitudes toward the topic became more positive after reading the text—particularly with the human author—suggests that not only could the credibility of human authors be perceived as higher, but their persuasiveness may also be greater. Future studies should further investigate the relationship between attitudes toward content and its perceived credibility, with a focus on how factors such as confirmation bias might influence these dynamics.
With the increased use of ATG technology in journalism, other textual content, and more transparent labeling, future readers will hopefully be much more aware of this novel authorship cue. Moreover, what might complicate or at least change the investigation of AI authorship in future research is the increasing co-authorship and, thus, the blurring of roles in the writing process (Cress and Kimmerle 2023; Luther et al. 2024). A fundamental investigation of the perception of AI authorship is, therefore, long overdue, especially since ChatGPT, but also everybody with the help of ChatGPT and other LLMs, can write about any topic regardless of the truthfulness of the information. Against this background, our results should be seen as a favorable vote of confidence in a technology that holds great potential. On the other hand, they are a warning signal in view of the obvious deficits of LLMs in consistently delivering reliable and robust scientific information.

5. Conclusions

This study contributes to exploring AI authorship on a scientific topic, considering today’s LLMs’ language and data access capabilities. Prior research suggested small or no differences in message credibility between human and AI authorship, which we tested by using equivalence testing and a sufficiently large sample size to detect even small effects. While the results revealed that the pure suggestion of AI authorship led to lower credibility ratings of the text and author evaluations, major negative effects on credibility were not observed, even on a topic more complex than a weather forecast. Our finding is particularly important in light of the increasing use of LLMs for information searches. Although work is constantly being invested to improve generative AI in terms of factual accuracy, its reliability should be critically scrutinized, especially where scientific information is concerned. More research and a broad public debate are needed to accompany the spread of this specific type of AI and investigate people’s attitudes and acceptance of it. In particular, the role of further influencing factors, such as preconceptions and attitudes on generative AI, should be investigated. Since the awareness that an LLM’s answers are based on the calculation of probabilities can significantly influence the assessment of the reliability of the information, credibility perceptions might vary with growing experience and attention on this AI area. Therefore, our approach is a step forward in exploring the perception and evaluation of AI authorship for complex topics, reflecting recent developments in NLG.

Author Contributions

Conceptualization, A.L.H. and J.K.; methodology, A.L.H. and J.K.; analysis, A.L.H.; resources, J.K.; data curation, A.L.H.; writing—original draft preparation, A.L.H.; writing—review and editing, A.L.H. and J.K.; visualization, A.L.H.; supervision, J.K.; project administration, J.K.; funding acquisition, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Leibniz-Institut für Wissensmedien (STB Data Science).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of the Leibniz-Institut für Wissensmedien (protocol code LEK 2020/053, approved on 12 November 2020).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original data presented in the study are openly available in OSF at https://osf.io/gpkc6/ (accessed on 14 August 2024).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Authorship Manipulation—Factor Level “AI” (Original)

Im Folgenden werden Sie einen Text zum Thema Wölfe in Deutschland lesen. Dieser erschien im Frühjahr 2020 auf der Wissenschaftsseite der Süddeutschen Zeitung (SZ).
Er wurde vom Computeralgorithmus AutomatedTXT (Version 4.9) verfasst, der Methoden der künstlichen Intelligenz (KI) zur Analyse und Produktion natürlichsprachlicher Texte verwendet. Methoden der künstlichen Intelligenz zur Texterstellung finden bereits seit einigen Jahren Anwendung. AutomatedTXT ist so programmiert, dass er eine große Menge an Daten analysieren und die darin enthaltenen Informationen zu einem Text zusammenfügen kann. Eine Überprüfung durch einen Menschen ist dadurch nicht mehr notwendig.
Für den folgenden Text griff AutomatedTXT auf öffentlich zugängliche Informationen der Dokumentations- und Beratungsstelle des Bundes für den Wolf (DBBW), des Statistischen Bundesamtes sowie des Bundesministeriums für Umwelt, Naturschutz und nukleare Sicherheit (BMU) zurück.

Appendix B. Authorship Manipulation—Factor Level “AI” (English Translation)

Below, you will read a text on the topic of wolves in Germany. It appeared in spring 2020 on the science page of the Süddeutsche Zeitung (SZ).
It was written by the computer algorithm AutomatedTXT (version 4.9), which uses artificial intelligence (AI) methods to analyze and produce natural language texts. Artificial intelligence methods for text creation have been used for several years. AutomatedTXT is programmed to analyze a large amount of data and combine the information it contains into a text. This means that a human inspection is no longer necessary.
For the following text, AutomatedTXT used publicly available information from the Federal Documentation and Advisory Center for Wolves (DBBW), the Federal Statistical Office, and the Federal Ministry for the Environment, Nature Conservation, and Nuclear Safety (BMU).

Appendix C. Authorship Manipulation—Factor Level “Human” (Original)

Im Folgenden werden Sie einen Artikel zum Thema Wölfe in Deutschland lesen. Dieser erschien im Frühjahr 2020 auf der Wissenschaftsseite der Süddeutschen Zeitung (SZ).
Er wurde von Wissenschaftsjournalist Robert B. Meyer (Jahrgang 1971) verfasst. Seine journalistischen Schwerpunkte liegen in den Bereichen Biodiversität, Naturschutz und Meeresbiologie.
Für den folgenden Text griff der Journalist auf öffentlich zugängliche Informationen der Dokumentations- und Beratungsstelle des Bundes für den Wolf (DBBW), des Statistischen Bundesamtes sowie des Bundesministeriums für Umwelt, Naturschutz und nukleare Sicherheit (BMU) zurück.

Appendix D. Authorship Manipulation—Factor Level “Human” (English Translation)

Below, you will read an article on the topic of wolves in Germany. It appeared in spring 2020 on the science page of the Süddeutsche Zeitung (SZ).
It was written by science journalist Robert B. Meyer (born 1971). His journalistic focus is on the areas of biodiversity, nature conservation, and marine biology.
For the following text, the journalist used publicly available information from the Federal Documentation and Advisory Center for Wolves (DBBW), the Federal Statistical Office, and the Federal Ministry for the Environment, Nature Conservation, and Nuclear Safety (BMU).

Appendix E. Article (Original). Regarding the Authorship Manipulation, an Image of a White, Middle-Aged Man or a Symbol Image for an Algorithm (Program Code) Was Displayed; Not Shown for Copyright Reasons

27 February 2020, 17:08 Uhr
Wölfe in Deutschland
Der Wolf breitet sich in Deutschland wieder aus, eine Art, die vor einem Jahrhundert ein Symbol der Angst war und bis zur Ausrottung gejagt wurde. Die Jagd auf den grauen Wolf ist seit 1945 verboten. Heute leben in Deutschland etwa 150 Wölfe, 75 davon in einem Rudel. Die Art beginnt sich wieder vom Zentrum in die Randgebiete des Landes auszubreiten. In den letzten Jahren wurden Wölfe bis nach Hamburg und München gesichtet, was Naturschützer und Jäger gleichermaßen zur Wachsamkeit aufruft.
Die Zahl der Wölfe in Deutschland wird von der Dokumentations- und Beratungsstelle des Bundes für den Wolf (DBBW) überwacht. In einer am Dienstag veröffentlichten Erklärung teilte die DBBW mit, dass es derzeit zwischen 516 und 680 Wölfe in Deutschland gibt–ein leichter Anstieg gegenüber dem letzten Jahr. Das bedeutet, dass die Population nun den höchsten Stand seit dem 19. Jahrhundert erreicht hat.
Es besteht die Sorge, dass sich der Wolf in Deutschland unkontrolliert ausbreiten und Menschen angreifen könnte. Der erste Wolf in Deutschland kehrte 2012 aus Polen zurück und löste eine Kontroverse aus, nachdem er sechs Schafe im Land getötet hatte. Bayerische Schafhalter betonen, dass sie große Verluste erleiden würden, wenn sich solche Angriffe häufen würden. Wildtierschützer sagen, der Wolf habe ein Recht zu leben und sollte als gefährdete Art geschützt werden.
Die Landwirtschaft fordert Schutzmaßnahmen gegen den Wolf in Deutschland. Zumindest in Teilen Brandenburgs und Sachsens wird der Wolf als Bedrohung für Nutztiere angesehen. Diese Sichtweise erscheint Biologen übertrieben. Der Wolf hat in Deutschland noch nie große oder auch nur mittelgroße Schäden an Nutztieren verursacht, aber er wurde schon bejagt, wenn nur das Gerücht über solche Schäden aufkam.
Es gibt keine Gefahr durch den Wolf in Deutschland. “Wir brauchen keine Wolfsjagd”, sagte ein Sprecher der deutschen Grünen, “die Landwirte können sich selbst schützen.” Greenpeace ist der Meinung, dass die Wolfspopulation als wichtiges Element der Artenvielfalt weiterwachsen sollte, sagte Sprecherin Marie-Christine Keßler gegenüber Reuters. “Wir sind der Meinung, dass der Wolf als Art ein Recht auf Existenz hat”, sagte sie. “Wenn die Behörden ihr Großwild schützen wollen, sollten sie das mit naturverträglichen Mitteln tun und nicht durch das Töten von Tieren.”
Robert B. Meyer/AutomatedTXT

Appendix F. Article (English Translation). Regarding the Authorship Manipulation, an Image of a White, Middle-Aged Man or a Symbol Image for an Algorithm (Program Code) Was Displayed; Not Shown for Copyright Reasons

27 February 2020, 5:08 pm
Wolves in Germany
The wolf is spreading again in Germany, a species that was a symbol of fear a century ago and was hunted to extinction. Hunting the gray wolf has been banned since 1945. Today there are around 150 wolves living in Germany, 75 of them in a pack. The species is beginning to spread again from the center to the outskirts of the country. In recent years, wolves have been spotted as far away as Hamburg and Munich, calling for conservationists and hunters alike to be vigilant.
The number of wolves in Germany is monitored by the Federal Documentation and Advisory Center for Wolves (DBBW). In a statement released on Tuesday, the DBBW said there are currently between 516 and 680 wolves in Germany—a slight increase compared to last year. This means the population is now at its highest level since the 19th century.
There is concern that the wolf could spread uncontrollably in Germany and might attack humans. The first wolf in Germany returned from Poland in 2012 and sparked controversy after killing six sheep in the country. Bavarian sheep farmers emphasize that they would suffer major losses if such attacks became more frequent. Wildlife advocates say the wolf has a right to live and should be protected as an endangered species.
Agriculture calls for protective measures against the wolf in Germany. At least in parts of Brandenburg and Saxony, the wolf is seen as a threat to farm animals. This view seems exaggerated to biologists. The wolf has never caused large or even medium-sized damage to livestock in Germany, but it has been hunted whenever rumors of such damage arose.
There is no danger posed by the wolf in Germany. “We don’t need wolf hunting,” said a spokesman for the German Green Party, “farmers can protect themselves”. Greenpeace believes the wolf population should continue to grow as an important element of biodiversity, spokeswoman Marie-Christine Keßler told Reuters. “We believe that the wolf as a species has a right to exist,” she said. “If the authorities want to protect their big game, they should do so using nature-friendly means and not by killing animals.”
Robert B. Meyer/AutomatedTXT

Appendix G. Prompts Entered in GPT-3 Playground

  • The wolf is spreading again in Germany
2.
The number of wolves in Germany is monitored by the Federal Documentation and Advisory Service for the Wolf (DBBW).
3.
There is concern that the wolf could spread uncontrollably in Germany and might attack humans.
4.
Agriculture calls for protection measures against the wolf in Germany.
5.
There is no danger posed by the wolf in Germany

References

  1. Appelman, Alyssa, and Shyam S. Sundar. 2016. Measuring Message Credibility: Construction and Validation of an Exclusive Scale. Journalism & Mass Communication Quarterly 93: 59–79. [Google Scholar] [CrossRef]
  2. Bartneck, Christoph, Dana Kulić, Elizabeth Croft, and Susana Zoghbi. 2009. Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots. International Journal of Social Robotics 1: 71–81. [Google Scholar] [CrossRef]
  3. Bodani, Nikita, Abhishek Lal, Afsheen Maqsood, Sara Altamash, Naseer Ahmed, and Artak Heboyan. 2023. Knowledge, Attitude, and Practices of General Population Toward Utilizing ChatGPT: A Cross-Sectional Study. SAGE Open 13: 21582440231211079. [Google Scholar] [CrossRef]
  4. Böhm, Robert, Moritz Jörling, Leonhard Reiter, and Christoph Fuchs. 2023. People Devalue Generative AI’s Competence but Not Its Advice in Addressing Societal and Personal Challenges. Communications Psychology 1: 32. [Google Scholar] [CrossRef]
  5. Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. Language Models Are Few-Shot Learners. arXiv arXiv:2005.14165. [Google Scholar]
  6. Cress, Ulrike, and Joachim Kimmerle. 2023. Co-Constructing Knowledge with Generative AI Tools: Reflections from a CSCL Perspective. International Journal of Computer-Supported Collaborative Learning 18: 607–14. [Google Scholar] [CrossRef]
  7. Flanagin, J. Andrew, and J. Miriam Metzger. 2000. Perceptions of Internet Information Credibility. Journalism & Mass Communication Quarterly 77: 515–40. [Google Scholar] [CrossRef]
  8. Graefe, Andreas, and Nina Bohlken. 2020. Automated Journalism: A Meta-Analysis of Readers’ Perceptions of Human-Written in Comparison to Automated News. Media and Communication 8: 50–59. [Google Scholar] [CrossRef]
  9. Jang, Wonseok, Jung W. Chun, Soojin Kim, and Young W. Kang. 2021. The Effects of Anthropomorphism on How People Evaluate Algorithm-Written News. Digital Journalism 22: 103–24. [Google Scholar] [CrossRef]
  10. Jung, Yongnam, Cheng Chen, Eunchae Jang, and S. Shyam Sundar. 2024. Do We Trust ChatGPT as Much as Google Search and Wikipedia? In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. Honolulu: ACM, pp. 1–9. [Google Scholar] [CrossRef]
  11. Köbis, Nils, and Luca D. Mossink. 2021. Artificial Intelligence versus Maya Angelou: Experimental Evidence That People Cannot Differentiate AI-Generated from Human-Written Poetry. Computers in Human Behavior 114: 13. [Google Scholar] [CrossRef]
  12. Lakens, Daniël, Anne M. Scheel, and Peder M. Isager. 2018. Equivalence Testing for Psychological Research: A Tutorial. Advances in Methods and Practices in Psychological Science 1: 259–69. [Google Scholar] [CrossRef]
  13. Lermann Henestrosa, Angelica, and Joachim Kimmerle. 2024. Understanding and Perception of Automated Text Generation among the Public: Two Surveys with Representative Samples in Germany. Behavioral Sciences 14: 353. [Google Scholar] [CrossRef]
  14. Lermann Henestrosa, Angelica, Hannah Greving, and Joachim Kimmerle. 2023. Automated Journalism: The Effects of AI Authorship and Evaluative Information on the Perception of a Science Journalism Article. Computers in Human Behavior 138: 107445. [Google Scholar] [CrossRef]
  15. Luther, Teresa, Joachim Kimmerle, and Ulrike Cress. 2024. Teaming up with an AI: Exploring Human–AI Collaboration in a Writing Scenario with ChatGPT. AI 5: 1357–76. [Google Scholar] [CrossRef]
  16. Oehler, Felicitas, Sophia Kimmig, Robert Hagen, Joachim Kimmerle, Ulrike Cress, Klaus Hackländer, Janosch Arnold, Danny Flemming, and Miriam Brandt. 2024. The Role of Information Presentation for Wildlife Knowledge, Attitude, and Risk Perception. Conservation Science and Practice 6: e13089. [Google Scholar] [CrossRef]
  17. Proksch, Sebastian, Julia Schühle, Elisabeth Streeb, Finn Weymann, Teresa Luther, and Joachim Kimmerle. 2024. The Impact of Text Topic and Assumed Human vs. AI Authorship on Competence and Quality Assessment. Frontiers in Artificial Intelligence 7: 1412710. [Google Scholar] [CrossRef]
  18. Sundar, Shyam S. 1999. Exploring Receivers’ Criteria for Perception of Print and Online News. Journalism & Mass Communication Quarterly 76: 373–86. [Google Scholar] [CrossRef]
  19. Tandoc, Edson C., Lim J. Yao, and Shangyuan Wu. 2020. Man vs. Machine? The Impact of Algorithm Authorship on News Credibility. Digital Journalism 8: 548–62. [Google Scholar] [CrossRef]
  20. Treves, Adrian, Lisa Naughton-Treves, and Victoria Shelley. 2013. Longitudinal Analysis of Attitudes Toward Wolves. Conservation Biology 27: 315–23. [Google Scholar] [CrossRef]
  21. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. Advances in Neural Information Processing Systems 30. Available online: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed on 14 August 2024).
  22. Wang, Sai, and Guanxiong Huang. 2024. The Impact of Machine Authorship on News Audience Perceptions: A Meta-Analysis of Experimental Studies. Communication Research, 00936502241229794. [Google Scholar] [CrossRef]
  23. Wölker, Anja, and Thomas E. Powell. 2021. Algorithms in the Newsroom? News Readers’ Perceived Credibility and Selection of Automated Journalism. Journalism 22: 86–103. [Google Scholar] [CrossRef]
Figure 1. Results of the equivalence test on message credibility. The area between the vertical dashed lines represents the a−priori determined smallest effect size of interest (SESOI).
Figure 1. Results of the equivalence test on message credibility. The area between the vertical dashed lines represents the a−priori determined smallest effect size of interest (SESOI).
Journalmedia 05 00069 g001
Figure 2. Boxplots of the dependent variables message anthropomorphism, intelligence, and message credibility by labeled authorship.
Figure 2. Boxplots of the dependent variables message anthropomorphism, intelligence, and message credibility by labeled authorship.
Journalmedia 05 00069 g002
Figure 3. Regression lines depicting the relationship between prior attitude toward the article’s topic and perceived message credibility by labeled authorship.
Figure 3. Regression lines depicting the relationship between prior attitude toward the article’s topic and perceived message credibility by labeled authorship.
Journalmedia 05 00069 g003
Table 1. Absolute and relative (in percent) numbers of participants by gender and educational level.
Table 1. Absolute and relative (in percent) numbers of participants by gender and educational level.
N%
Gender
Male30341.28
Female42057.22
Not specified111.50
Educational level
Low40.54
Middle11415.53
High61683.92
Table 2. Means and standard deviations of all measures by labeled authorship.
Table 2. Means and standard deviations of all measures by labeled authorship.
Human
(n = 359)
AI
(n = 375)
VariableMSDMSD
Author’s tone2.420.992.410.92
Message credibility3.820.533.620.58
Anthropomorphism3.800.773.240.88
Intelligence4.090.683.800.73
Source credibility4.540.964.311.03
Intention to recommend3.091.162.861.18
Intention to read3.751.093.541.12
Prior attitude3.100.383.110.35
Posterior attitude3.160.363.120.36
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lermann Henestrosa, A.; Kimmerle, J. The Effects of Assumed AI vs. Human Authorship on the Perception of a GPT-Generated Text. Journal. Media 2024, 5, 1085-1097. https://doi.org/10.3390/journalmedia5030069

AMA Style

Lermann Henestrosa A, Kimmerle J. The Effects of Assumed AI vs. Human Authorship on the Perception of a GPT-Generated Text. Journalism and Media. 2024; 5(3):1085-1097. https://doi.org/10.3390/journalmedia5030069

Chicago/Turabian Style

Lermann Henestrosa, Angelica, and Joachim Kimmerle. 2024. "The Effects of Assumed AI vs. Human Authorship on the Perception of a GPT-Generated Text" Journalism and Media 5, no. 3: 1085-1097. https://doi.org/10.3390/journalmedia5030069

Article Metrics

Back to TopTop