**2. Literature Review**

CSR has been discussed for more than half a century. Carroll [11] proposed a pyramid model of CSR that points out firms' legal, ethical, and philanthropic responsibilities based on economic responsibility. After Carroll's seminal work, the definition of CSR often includes managerial and social terms such as influence, social impact, system, and strategy. For example, Aguilera et al. [12] addressed that CSR is a long-term strategy to realize universal values in strategy and management. It means that CSR is a leading business activity to go beyond simple charity because it is a task that fundamentally changes strategy, management, corporate culture, and even corporate identity.

Various definitions of CSR in the literature also take a viewpoint of harmonious development for economy, society, and environment by emphasizing non-financial performance in common [13,14]. The key elements of non-financial performance are characterized as ESG, which consists of environmental, social, and governance criteria and their sub-factors to evaluate investments based on companies' responsible impacts [15]. The importance of ESG performance has been emphasized in recent practice in that non-financial performance based on ESG factors positively impacts corporate sustainability [16–18]. Barko et al. [19] argued that when a company does not consider all economic, social, and environmental aspects, it will engage in unsustainable management and be at risk.

In recent years, the discourse that demands publicity and ethical and philanthropic responsibility as social responsibility for companies is expanding socially, and the focus of CSR is gradually expanding to the concept of corporate citizenship. The concept of corporate citizenship includes an assumption that companies have an obligation to devote themselves to public goods like citizens in modern society [5]. Therefore, corporate citizenship refers to a series of socio-economic activities that firms perform to fulfill its roles and obligations as a member of society [6]. Previous studies interchangeably used the terms CSR and corporate citizenship (e.g., [9,20,21]) as an equivalent view between the two concepts proposed by Matten and Crane [5].

Several academic studies and practitioners' articles have attempted to analyze similarities and differences between CSR and corporate citizenship and between CSR and ESG. For example, corporate citizenship emphasizes management of internalities (i.e., companies' rights and duties), while CSR focuses on management of externalities for companies [10]. ESG helps measure or quantify social initiatives, while CSR makes companies accountable for their social commitments in a qualitative way [22]. Rendtorff [23] discussed corporate citizenship, CSR, corporate governance in terms of business legitimacy by emphasizing different stages of cognitive, pragmatic, moral legitimacy for proactive corporate citizenship. Costa et al. [24] considered ESG as a tool to control sustainability practices, and claimed that CSR along with environmental management and value creation are interconnected to achieve corporate sustainability.

As various concepts and approaches relevant to corporate sustainability have been addressed in the literature, there have been attempts to employ text mining to effectively characterize sustainability concepts in a large set of relevant articles or documents. Mazza et al. [25] identified 11 CSR related topics by applying the Latent Dirichlet Allocation (LDA) method to CSR communication data of five energy companies on Twitter. Goloshchapova et al. [26] analyzed European and United Kingdom CSR reports through LDA to extract underlying topics, and they identified commonly addressed topics as well as sector-specific topics in CSR reports. Kiriu and Nozaki [27] employed text mining based on word frequency and divergence to characterize ESG activities stated in Japanese CSR reports. Parra et al. [28] employed supervised and unsupervised machine learning methods for text data in corporate citizenship reports of seven major American companies to identify how corporate citizenship issues have been handled over time.

The aforementioned efforts to build a distinction of different corporate sustainability terminologies helped researchers understand various social responsibility and accountability concepts. However, previous studies subjectively defined those concepts and assigned related corporate sustainability theories to each terminology based on perceived theoretical similarities and differences from each author's individual point of view. Therefore, unclear boundaries across CSR, ESG, and corporate citizenship are still a major concern due to the lack of in-depth discussion of comprehensively structuring those concepts from objective point of view. In this regard, this study uses a more scientific and quantitative approach—text mining for analyzing a number of research articles associated with three different labels—CSR, ESG, and corporate citizenship—to effectively and objectively capture underlying concepts addressed in CSR, ESG, and corporate citizenship research fields.

#### **3. Methods**

This study characterizes distinct properties of CSR, ESG, and corporate citizenship by extracting keywords and latent topics in the relevant literature through the term-frequency analysis [29] and CTM (Correlated Topic Modeling) [30] of text mining. CTM is useful to extract latent topics in a document set by considering possible correlations between latent topics based on the probabilistic modeling of term frequency in the document set [30]. The meta-analysis of existing literature review methods mostly categorizes and conceptualizes key topics in a literature set through an ad-hoc manner or prior knowledge as a top-down approach [31]. In contrast, a text-mining approach can derive underlying characteristics and topics in a literature set based on the object text information of an abundant literature set; meaningful contexts in a large document set that are difficult to be manually captured can be effectively extracted for analysis. From this point of view, this study analyzes concepts and contexts for corporate citizenship, CSR, and ESG based on text information in relevant large literature sets through text mining. The text mining approach of this study aims to clarify differences in corporate citizenship, CSR, and ESG.

First, SSCI (Social Science Citation Index) journal articles with keywords of "corporate citizenship", "CSR", and "ESG" were identified through the Web of Science article database [32]. A total of 1235 journal articles (i.e., 701 articles for CSR, 296 articles for ESG, and 238 articles for corporate citizenship) published from 1990 to June 2021 were considered for text mining. The title, keywords, and abstract of articles in each group (i.e., CSR, ESG, corporate citizenship) were extracted and saved into "txt" files for the input data of text mining for CSR, ESG, corporate citizenship, respectively.

Then, the following pre-processing procedure were performed to refine original text data for text mining [33]. First of all, raw text data from original documents were refined by removing unnecessary elements (i.e., characters, figures, numbers, punctuations, and whitespaces). Then, stop-words that do not provide meaningful information in text (e.g., "a" and "the") were removed from the text data. In addition, words with the same root were transformed to the same term (i.e., stemming). Common terms that appear frequently in a small document group represent more distinguished features than terms occurring in all the documents [34]. To reflect this, very frequently occurring terms across articles were removed based on the term frequency-inverse documents frequency method, where the importance of a term inversely decreases according to the number of documents that contain the term [35]. The pre-processing procedure was performed for the text data of each literature group by following the manual of the tm package for the R statistical software [36] and guidelines provided by Grün and Hornik [37].

Through the above pre-processing procedure, a document-term matrix to represent each literature group was generated by the *tm* package for text mining. As an initial text mining analysis, frequent terms occurring terms in each literature group were investigated; word clouds to visualize the top 50 frequent stemmed terms and the top 20 frequent terms within each literature group were respectively analyzed to identify the characteristics of each literature group. Next, the *topicmodels* package for the R statistical software [38] was applied to the pre-processed text data for CTM to extract hidden topics within each literature group. CTM is based on an unsupervised machine learning algorithm in which the number of derived topics should be predefined before modeling. This study set the number of topics to five to facilitate the interpretation and comparison of topic modeling results. In order to interpret latent topics, this study further reviewed the frequent terms and articles associated with each derived topic from CTM results.
