Research Background
The Sustainable Development Goals (SDG) 4 for Global Education Agenda aims to “ensure inclusive and equitable quality education and promote lifelong learning opportunities for all” by 2030. The sharing mechanisms of social platforms allow for users to share messages rapidly and in short intervals. The public can integrate the content from different platforms on the basis of the nature of the content [
1]. Thus, similar situations can be observed in content farm messages. To create a larger-scale reposting effect, messages often have sensational headlines to increase the reader click rate.
Content farms are a form of web page that produces large quantities of low-quality articles on a variety of topics. They use keywords to increase the ranking of the result pages in various search engines. The business model of content farms is based on the placement of commercials or the sale of certain items on the pages or in the content to generate revenue. In terms of the benefit to business, a webpage’s profit derives from the quantity of the internet flow; the higher the flow, the more the profit. To attract more readers to view pages, content farms create articles on highly popular topics and add exaggerated titles and content, such as modified or fake images. As a result, readers read articles without knowing the truth [
2].
Content farms usually produce articles that (1) may be short and not in the format of news reports, with simple language and few citations or links; (2) are filled with advertisements; (3) contain content from other websites that has been copied or modified; and (4) contain many external links [
3]. These features indicate that the content of the website is not credible. However, general audiences have difficulty determining credibility because of the quantity of the information, and often accept the information as true.
Messages that have been plagiarized and reposted without authorisation lose their credibility. In addition to being false, the articles can have a substantial effect if their content is used to advocate for a specific purpose, event, or person. During times of disaster, this can cause social unrest [
4]. In addition, content farms can be hired to make politicians and their teams appear more popular. With the growing influence of social media, the manipulation of data through social media has become an efficient method to influence users [
5].
Because of the nature of content farm messages, evaluating messages is a complex and multifaceted task. Strategies can be used to determine whether messages are credible in terms of their content, writing style, dissemination path, and organisational credibility. News-related features (such as the title, contents, and author of an article) and social features (such as response, dissemination path, and platform) can be used perform message discrimination. However, messages can comprise text, multimedia, or internet articles, and therefore require appropriate techniques and resources [
6].
Educational organisations in different fields have used media literacy to establish a set of message discrimination directions. Audiences can use the fact-checking steps provided by the Harvard University Library [
7] and the International Federation of Library Associations (IFLA) [
8] to maintain a discerning attitude towards messages and perform a multifaceted verification their sources with available tools. However, the above-mentioned method indicates that discriminating among messages is a complex process involving the use of corresponding techniques to discern the type of message. Message discrimination is not an easy task. The method assesses the audience’s media literacy and determines whether they can maintain a discerning attitude and perform message discrimination.
Because of the difficulty of discriminating among messages, various countries have invested in research on artificial intelligence (AI) learning and discrimination to help audiences. In addition, the contents and titles of content farm messages have identifiable features that distinguish them from truthful messages [
9]. In 2019, the University of Washington developed the Grover message discrimination system, which uses deep learning to learn the features of fake messages. This system has a message discrimination accuracy of 92% [
10].
However, no research has been conducted on the use of AI systems for the discrimination of Chinese messages. In addition, although an English-language message database has been created, no similar database has been created for Chinese content. Thus, this study used a system based on the collection of data from the internet (comprising Uniform Resources Locators (URLs), domains, and web address information) and established a terms database. The content of news articles and news-related information was disassembled and analysed to construct a model with AI learning to create a message discrimination system for Chinese content.
The AI news source credibility identification system developed in this study discriminates among messages on the basis of writing style. In terms of writing style, content farm messages have a higher proportion of adjectives and adverbs than regular news. These features can be used as a basis for helping the audience to discriminate information and for helping the system to learn.
In addition to developing the system, this study conducted a media literacy course. The utilisation of the system was the basis for determining whether the use of the system for learning media literacy positively affected the users’ learning effectiveness and attitudes toward media literacy. The researchers designed both the courses and the test items to evaluate the users’ results. The questionnaires were based on learning attitude theory to determine the users’ attitude towards learning media literacy after they used the system as a discrimination tool.
Based on the aforementioned abstract introduction and this study’s objective, this study proposed the following two research questions:
After using the AI news source reliability identification system, did the experimental group have higher learning effectiveness than the control group?
After using the AI news source reliability identification system, did the experimental group have a more positive learning attitude than the control group?