This text-analytics-based research study delves into the sustainability disclosures of companies to understand the expansive sustainability issues, practices, and trends. The origin of the disclosures’ data is the website of Ceres, a sustainability-focused, non-profit concern that has made available several searchable databases (
https://tools.ceres.org/resources/tools/sec-sustainability-disclosure/ (accessed on 1 September 2021)) that offer a rich source for analysis. According to Ceres, its SEC Sustainability Disclosure Search tool can assist in gaining insight into how companies handle the perils as well as the benefits that may arise from issues such as climate change, carbon asset risk, water availability and quality, and hydraulic fracturing (
ceres.com). Their tool searches the text of SEC annual filings (10-K, 20-F, and 40-F) and readily identifies pertinent disclosures. According to Ceres, it then distinguishes the information, identifies the location of the disclosure in the filing, and makes it all available in the searchable database (
ceres.com) [
84,
85]. To this end, our data are from a legitimate source, namely the Securities and Exchange Commission, a government entity. The database makes it easy for researchers to access the disclosures via an API and be able to analyze the large amount of text data to gain insight into corporate sustainability practices (
https://www.ceres.org/resources/tools/sec-sustainability-disclosure-search-tool (accessed on 1 September 2021)). Such a tool is also useful to investors and other stakeholders interested in knowing more about the opportunities and risks presented by climate change and other ESG issues in companies. According to Ceres, these disclosures in the SEC filings can be compared with the voluntary disclosures that are made via other channels, such as those made to non-profits. To reiterate, machine learning and text analytics together offer powerful analytical tools to interpretand study a large corpus of textual data in a productive way [
26,
32,
83]. The understanding accrued from the study would guide top executives, shareholders, institutional and individual investors, and watchdogs to make informed decisions. The text data were retrieved using the Ceres API, and they were cleaned using advanced data-cleansing methods. Preprocessing steps were then applied. However, we focused only on the disclosures relating to climate change risks filed between 2011 and 2020. The API-enabled disclosure information, such as the company name, division, industry, year, and ticker, and the disclosure content were all in JSON format. Therefore, Python was used to extract all file name numbers and climate change risk disclosure content, where available. Data were retrieved as of 30 September 2020. There were a total of 53,418 files (disclosures) provided by 5347 companies. However, only 48% of the files mentioned climate change risks. So, machine-learning-based text analysis techniques were applied only to those 25,428 files. First, we conducted descriptive data analysis to obtain a panoramic view of the disclosures. Next, we conducted topic modeling and other text analytic methods. For example, we constructed the term frequency–inverse document frequency (TF-IDF) model and K-means to group industries with similar foci.
Figure 1 outlines our overall research methodology.
3.1. Text Analytics
Next, machine-learning-based text analytics was applied to examine the disclosures. To apply ML and text analytics, considerable data engineering preparation and data-preprocessing methods were adopted. The retrieved data for each disclosure were saved as a text file prior to vectorization. A smatter of disclosures was deleted due to a paucity of relevancy/materiality and broken links. A caveat with text data is that predictive analytic models developed with text data muddy the analytical methods. First, textual data cannot be used as input in many precise numerical or quantitative models. Therefore, an NLP program was run to transfigure the textual content into distinct components elements for additional scrutiny. Second, text-based datasets are bigger in terms of volume vis-à-vis quantitative datasets. Therefore, a robust archetype warrants data extraction by recognizing the most appropriate chunks of data. In the data engineering step, the granular information in the summary section was converted into simple text output (files) through the removal of typical punctuation marks, spaces, numbers, and standard stop words using the Natural Language Toolkit (NLTK). Next, the text was changed to lowercase using the NLTK and TextBlob (
https://textblob.readthedocs.io/en/dev/ (accessed on 1 September 2021)) for uniformity. Tokenization was applied to this engineered data to break up the sequence of strings into word groups [
18,
81]. Next, the lemmatization approach was carried out to reduce the words into their bare root shell (e.g., “bought” and “buying” were substituted with “buy”). Lemmatization draws on aggregating in unison the various manifestations of each word. This facilitates the examination of a collection of words as composite words [
18,
81]. The benefit lies in finding meaning within and between the composite words. The pandas package (
https://pandas.pydata.org/) (accessed on 1 September 2021) was used to screen and arrange the text files into data frames for ready analysis. The sklearn package (
https://scikit-learn.org/stable/index.html (accessed on 1 September 2021)) was then used to iterate the output. When planning to conduct text analytics, the researcher’s first thought would be to use NLP, typically one of the most rudimentary of models, namely the Bag of Words. However, such models fail to capture the syntactic relationship between words. For example, suppose TF-IDF (term frequency–inverse document frequency) was calculated based on only the Bag of Words. This model will not be able to capture the difference between “I live in the United States,” where “states” is a noun, and “He states his intention,” where “states” is a verb. Part-of-speech (POS) tagging is the method by which words in a corpus can be marked up to the analogous part of the speech tag based on its definition and context. For example, in the sentence “Give me your answer,” answer is a noun, but in the sentence “Answer the question,” answer is a verb. POS tagging can differentiate words based on their functions in sentences. There are 36 different types of POS tags, and the common tags are adjective, verb, noun, proper noun, adverbs, and others. In this study, POS tagging techniques were applied to preprocessed tokens and only tokens in noun and proper noun tags were retained. The rationale is that the nouns and proper nouns carry more meaningful information than other tags.
Figure 3 shows the top 55 noun and proper nouns and frequencies in the overall disclosure reports during 2011 to 2020. For a detailed count of each term, please refer to
Appendix B.
As shown in
Figure 3, the darker the color and the larger the size, the more frequently the word appears in the overall disclosure report. It is obvious that energy-related words, such as gas, emission, power, and oil, formed a substantial portion of the content. Climate-change-related words, such as greenhouse, ghg, water, air, and carbon dioxide, were also quite significant. Furthermore, regulation-related terms, such as legislation, compliance, regulation, and agreement, were prominent as well.
The term frequency–inverse document frequency (TF-IDF) technique mentioned before was next applied to calculate the weight of each term to denote its relative importance in a document. This is a data retrieval approach that allocates a weight to a term’s frequency (TF) and its inverse document frequency (IDF). Each term is given these two scores. The weight of the term is then computed as a product of these scores. TF-IDF is a statistical modeling tool used to assess the worth of a word to a document that belongs in a corpus. The value of a word increases vis-à-vis the frequency of its appearance in a document [
29,
44]. Simultaneously, it decreases in reverse symmetry to the frequency of its presence in the corpus. The primary thrust of TF-IDF is that when a word or phrase appears in a document with a greater frequency of TF, and seldom shows up in other documents, the word or phrase has reasonable classification prowess and is competent for classification problems. In this study, the companies’ disclosures were divided into 31 industry groups to calculate the TF-IDF value for the 55 extracted keywords. The resulting matrix is a data frame with a size of 31 × 55.
Figure 4 displays partial examples of the final matrix.
There were a total of 31 rows representing 31 different industry group. Each row contained 55 values, and a high value for a keyword implied that the high term frequency was weighted by the oddity of the keyword’s occurrence. Subsequently, the K-means clustering model was developed to surface-out the key sustainability concepts. K-means, a widely used ML algorithm for clustering, is typically set to classify cases based on similarity measures (i.e., the distance between the cases). It is generally applied to solve classification and pattern recognition problems. For this study, six high-level clusters were identified with a word cloud utility. Classification with the KNN classifier (a supervised machine learning algorithmic method) next ensued. In this application, the data were partitioned into two sets (training and testing datasets) to assess the success of the classifying algorithm. The rationale is to recognize the varied subclasses (i.e., the clusters) in the data such that the observations (cases) in the same subclass are akin, while the observations in the different clusters are distinct from each other. The TF-IDF features of the top 55 keywords for each industry were extracted as our model input variables. K-means clustering was implemented to obtain the clusters, which were then labeled numerically. Since the goal was to find the patterns of keywords that describe different industries, the K-means algorithm was used to build the segmentation of six clusters formed from among the 31 industry groups that contained 4461 companies. The model was evaluated using the silhouette score. The silhouette score is an average measure of how close each point in one cluster is to points in the neighboring clusters, and a positive value indicates the point has been classified to a correct cluster. The six-cluster model had an acceptable average silhouette score of 0.4, as shown in
Figure 5.
In
Figure 6, the taxonomy of clusters and keywords is displayed. The six different clusters are shown in different colors and shapes, and for each cluster, the most important keywords are identified. One can describe each cluster based on its keywords. The silhouette score is used to appraise the caliber of a cluster identified with the use of standard clustering algorithms, such as K-means. The issue addressed is to what degree are the observations clustered with other observations that are like one another. The value of the silhouette score varies from −1 to 1. When the score is 1, the clusters are distinct and sharply delineated. A value close to 0 indicates congruent clusters with the observations in proximity to the decision boundary of the nearby clusters. In this study, the derived silhouette score of 0.4 indicated the clusters were quite distinct.
There were in total 6 clusters that were marked with different colors and shapes. The number of shapes in each cluster represents the number of industries that have been grouped into that cluster.
As observed in
Figure 6, the blue circle represents the top five key features in cluster 1. These are different from the other five clusters.
Table 4 gives detailed information about each cluster, the key features, the number of industries, and the specific industries.
Besides common key features, such as gas, ghg, and climate, industries in different clusters have their own specific focus. For example, oil/gas manufacturing and consumption industries, such as oil and gas, transportation, and aerospace and defense are in cluster 1 with the unique key feature “oil.” An eclectic group of industries, representing “climate” change, are in cluster 2. One could extrapolate that all industries are concerned about climate change. Indeed, several industries fall into this cluster. Cluster 3 is labeled as “energy,” and the main industries in this cluster are electric power and gas utilities. Cluster 4 has the greatest number of industries and is associated with “impact,” which has a couple of interpretations. One, companies are concerned about the impact of climate change and environmental degradation on their business and on the planet. Two, companies have been concerned in recent years with the effect of climate change and the environment on financial performance and shareholder value. In cluster 5, we see some association with “cost,” likely related to the loss and damage incurred due to climate change and an adverse environment (e.g., the negative effect of extreme weather).
Heavy emission industries, such as automobile and coal mining, are in cluster 6 with the unique key feature “ghg.” When compared to the “issues” available at
ceres.com, climate change, carbon, and water are common topics, while this study surfaced “impact” (of sustainability) as a key topic.
As discussed previously, a text-based dataset contains rich, yet complicated information compared to numerical datasets, so a data-preprocessing step was needed to tease the relevant information out of the lengthy text. Second, besides TF-IDF, several other textual analysis methods, such as sentiment analysis, POS tagging, and doc2vec, were also applied.
To gain insight into the regulatory aspects mentioned by companies in their sustainability disclosures, the statutes typically mentioned in terms of compliance by extracting law-related phrases using spaCy (
https://spacy.io/ (accessed on 1 September 2021)), a natural language processing API, were identified. This allows one to extract chunks of text by using POS tagging annotation or a phrase pattern. For example, if one wants to extract all the adjectives followed by a noun, one can specify {POS:ADJ},{POS:NOUN} and the API will return all the matched results. Specifically, we used the matcher engine called Rule Based Matcher to perform token-based matching to find descriptive words for “laws,” “Act,” and “regulations.” We chose to extract words connected with “Act” because they were domain specific to law. For example, this gives “Clean Water Act” or “Federal Power Act” instead of vague law phrases, such as “related requirements” or “state law.” A caveat is we did not know which state. At first, POS tags were used to set the matching patterns, such as below:
However, we noticed some specific statutes or act names were abbreviated or contained values such as “10-K SEC Act,” which could not be captured by the POS tagging patterns. Given the fact that the descriptive words before “Act” always modify it in this case, another approach was taken by setting the wildcard token (token with wildcard logic) as the matching pattern, which tells the matcher to search for and extract any two or three tokens that the word “Act” follows. As a result of this method, one can locate all the sentences that contain “Act” and find out which “Act” is the most concerning and frequently mentioned during the past 10 years. In
Figure 7, a bar chart shows the most cited statutes (acts) in the disclosure reports during the past 10 years.
The top 30 federal and state statutes by frequency cited in the disclosures are displayed. The implication is the higher the frequency of mention, the more relevant the statute to sustainability-related activities. The extraction of the statutes is a novel contribution of this study. Further, we were also interested in the types of greenhouse gases mentioned most frequently in the companies’ disclosure reports during the past 10 years. The following donut chart was developed by counting the number of occurrences of various types of greenhouse gases mentioned in the companies’ disclosures and the proportional relationship between them and the total number of counts. This is shown in
Figure 8.
The chart shows that carbon dioxide is the main source of greenhouse gases, followed by methane. Companies appear to pay close attention to the emission of carbon dioxide and methane, followed by other gases. The fact that greenhouse gases are even mentioned in the disclosures indicate companies are getting interested in addressing greenhouse-gas-caused climate change. To get a sense of how much a company’s disclosure changes over time, the content variance for each company was calculated. First, the NLTK POS tagging package was used to extract top 20 most frequent noun keywords from the companies’ most recent and oldest disclosure reports, and then, cosine similarity was applied between the two vectorized lists. For example, the company Advance Auto Parts Inc. (ticker: AAP) has been providing climate-change-related disclosures consistently from 2011 to 2020. For the 10 years of disclosure content, the top 20 most frequent noun keywords of its disclosure report for 2011 and those in its report for 2020 were compared. A content variance of 1 means the two disclosures’ top 20 keywords are the same, whereas a content variance of 0 means the two disclosures’ top 20 keywords share nothing in common. The content variance for each company was calculated and the results visualized using a box-and-whisker plot for all sectors.
Figure 9, a box-and-whisker graph, exhibits the content variance distribution across the different sectors of all companies that provided climate-change-related disclosures.
Figure 9 shows that the median content variance scores for eight sectors were all around 0.45. Since its box is more skewed downward, the transportation and communications division tends to have a similarity score closer to 0 compared to other sectors. We concluded that the disclosure contents for companies in the transportation and communications sector vary more over time when compared to other sectors. Doc2Vec similarity captures how similar a company’s disclosure compares with other companies’ disclosure reports within the same industry. The methodology for Doc2Vec (
https://medium.com/wisio/a-gentle-introduction-to-doc2vec-db3e8c0cce5e (accessed on 1 September 2021)) is based on Word2Vec (
https://en.wikipedia.org/wiki/Word2vec (accessed on 1 September 2021)), a tool for unsupervised learning of uninterrupted depictions of large chunks of text, such as sentences, paragraphs, or even whole documents. This text analysis technique was applied to create a numeric representation of a document regardless of its length. The most recent disclosure report for each company was used as the input content. The Doc2Vec model was trained with all the disclosure reports and additional data, such as id and industry type, as additional vectors. The Doc2Vec score was calculated for each disclosure. If two disclosure reports within the same industry carried similar Doc2Vec scores, one can deduce that the content of these two reports is similar in terms of the expressions and words/phrases used. Examining the Doc2Vec score itself does not provide any insight. However, if the variance of these scores is calculated, one can gain a sense of how different or how variant the content is within a certain sector.
Figure 10 is a heat map showing the variance of Doc2Vec similarity scores of different sectors.
The darker and larger the box, the larger the variance, and the more variant the content within that specific sector. As shown in
Figure 10, the variance of Doc2Vec similarity scores was the highest in the wholesale trade sector, meaning the disclosure content of companies in the wholesale trade sector varies more from each other than from companies in other sectors.
3.2. Disclosure Sentiment
The well-known and commonly used technique of sentiment analysis computationally distinguishes and compartmentalizes beliefs expressed in the text. This helps especially to determine what the opinionator’s stance is toward a particular topic, etc. (
https://en.wikipedia.org/wiki/Sentiment_analysis (accessed on 1 September 2021)). Given that one of the goals of this study was to explore the relationship between a company’s risk propensity and climate change, it is useful to obtain insight into a company’s position on regulatory, climate change, and renewable energy policies. VADER (Valence Aware Dictionary and Sentiment Reasoner) is a commonly used lexicon and rule-based sentiment analysis tool that is particularly geared toward the extraction of sentiment expressed in social media platforms and functions well on texts in other spheres (
https://pypi.org/ (accessed on 1 September 2021)). The advantage of VADER is it can be used to analyze large amounts of text data, with a relatively high accuracy, sometimes even outperforming the human raters (
http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf (accessed on 1 September 2021)). It is sensitive to both the polarity (positive negative) and intensity of emotions. Besides this, VADER also has a basic understanding of the context of the text it reads. For example, “love” is a word that conveys positive meaning. VADER can recognize “did not love” as a negative statement because of its context awareness. It generates three polarity scores for a given text: positive, negative, and compound. The negative sentiment scores were used in this study since they better reflect the challenge of climate-change-related risk to a company’s business. The challenge faced in this analysis was to extract pieces of clean text containing the topics that we desired to evaluate, while discarding irrelevant data. An approach that used NLTK sentence segmentation—a sentence tokenizer to split a large disclosure corpus into sentences—was adopted. An attempt was made to search for topics in each sentence, followed by sentiment analysis. The overarching goal was to obtain the sentiment scores of environmental regulatory, climate change, and renewable technology to gain insight into the companies’ views on these three risk categories (excluding physical risk). Three sets of risk-category-related keywords were created (shown in
Table 5), including regulation, climate issue, and technology sentiments. The keywords related to regulation were scoped to include words such as “EPA” and “GHG.”
Three sets of sentiment analysis were conducted to capture companies’ perspectives on the three risk categories. The keywords were selected from the top 30 frequent keywords that were mentioned in the 25,428 disclosures.
Figure 11 is a line chart showing the compound sentiment score for the aspects of regulatory-related company disclosures. As shown in the chart, the increasing trend in the wholesale trade industry implies companies in this sector have complied with laws and regulation more effectively during the recent year. The compound sentiment in the mining sector decreased from 2019 to 2020.
Figure 12 shows the compound sentiment score for the aspects of climate-change-related company disclosures. As shown in the chart, the compound sentiment score for agriculture disclosures dramatically decreased during recent years. This implies that the agricultural business has been severely affected by the recent escalation in climate change and that companies perceive climate change as a risk factor to their business.
Figure 13 shows the compound sentiment score for the aspects of renewable-energy-related company disclosures. As shown in the chart, the compound sentiment score in agriculture disclosures dropped during the year 2020. This is because the number of disclosures in the agricultural sector is low, so the base number of the average is high, thereby resulting in extreme values. We next turn our attention to topic modeling.
3.3. Topic Modeling (LDA)
The aim of latent Dirichlet allocation (LDA) is to assist in the classification of the content of disclosures reports and find a mix of topics that companies focused on. LDA was applied to the disclosure content to elicit the key topics addressed by the disclosures [
26,
27]. By assigning each disclosure to a cluster, the model can glean the hidden characteristics in the text and generate labels for the supervised learning model. Since unsupervised learning models have a high degree of uncertainty, Gensim’s LDA implementation was used (
https://radimrehurek.com/gensim_3.8.3/models/ldamodel.html (accessed on 1 September 2021)). This uses the variational Bayes sampling method as model 1 and mallet LDA (
http://mallet.cs.umass.edu/ (accessed on 1 September 2021)), which is typically more precise than Gensim, as model 2 to conduct the cross-validation. Perplexity and coherence scores were used as evaluation standards to compare the topic models. Each disclosure content was assigned to a dominant topic as the content label. This way, one would know which aspect the specific company’s disclosure report focuses on. Based on this method, six key topics were identified and are described below. Additionally,
Appendix A contains an overview of all 70 topics. It includes the most conspicuous terms in each as well as the probability of occurrence (how high is the likelihood that the specific term appears in the topic) of these terms in a topic. To tweak the models, the text was lemmatized and bigrammed/trigrammed to group related phrases into one token for the model. Moreover, to extract the most sustainability-related topics, English stop words and certain common word lists were removed from the disclosure texts. In tokenization, the sentences were split into words, all words were lowercased, and all punctuations were removed. A stop-word list was developed, and those words were removed as well. Further, lemmatization and stemming were applied again to transform the words into their granular form (e.g., “bought” and “buying” would be replaced with “buy”) so they could be analyzed as single objects.
Figure 14 provides a general understanding of the focus of topics before constructing the topic model. It shows the term frequencies of the most used words in all disclosure reports. One can observe that “emission,” “gas,” and “energy” or possibly “climate change” and other environment-related words are frequently mentioned in the disclosures.
To build the initial LDA model, the parameter for the number of topics was set to 20, alpha to 0.01, and beta to 0.1 to develop the initial model as a baseline model for performance comparison. The perplexity level and coherence scores were calculated for this purpose. The perplexity level measures the probability of unexpected results in the classification. The coherence score is a value of the semantic similarity between high-frequency words. It assesses the quality of the learned topics; therefore, the higher the score, the better the quality of the topics extracted. The perplexity score measures how credibly the model represents or reproduces the statistics of the hold-out data. From a performance perspective, one would want to build a model with a low perplexity level and a high coherence score by tweaking the topics and other coefficients.
The overall performance of the initial model is shown below:
Perplexity: −6.39885911951267
Coherence score: 0.4222068061701333
The initial model demonstrated reasonable performance, as indicated by the perplexity level, while the coherence score indicated the model needs additional refinement. Tuning is a way to maximize the performance of a model without overfitting or producing a large variance. A series of sensitivity tests were conducted to determine the number of topics (K), Dirichlet hyperparameter alpha (Document-Topic Density), and Dirichlet hyperparameter beta (Word-Topic Density). The tests were performed sequentially. One parameter was examined at a time by holding others constant.
Figure 15 shows the coherence score for the number of topics across one epoch of the validation set with a fixed alpha = 0.01 and beta = 0.1. It plotted the number of topics by coherence score and studied the trend as the number of topics ranged from 2 to 15. We observed that while the coherence score was volatile as the number of topics changed, it had the best value when the number of topics was 6. As the coherence score appeared to decrease after 6 topics, K = 6 was chosen.
The evaluation metrics of the optimal model are shown below.
Perplexity: −7.196056573358106
Coherence score: 0.5500174631782340
In further comparing model 1 (Gensim’s LDA) to model 2 (mallet LDA), the perplexity level and coherence scores showed nearly identical performance with similar keywords in each topic. Therefore, to scope the analysis, only the results from model 1 (Gensim) were examined further. The results in
Table 6 were generated with this LDA model. Six topics were extracted along with their corresponding terms. On examination of the terms, each topic was assigned a label that most reflected the terms within. These topics and terms are indicative of the sustainability issues that companies are typically concerned with. Label categories to some extent reflect the companies’ emphasis on a certain topic. The six key topics are gas emission, carbon risk, climate change, loss and damage, renewable energy, and financial impact. When compared to the “issues” at
ceres.com, the relatively common topics were climate change and carbon risk. The other topics identified in this study included gas emission, loss and damage, renewable energy, and financial impact. Financial impact was a major topic identified in this study. This is consistent with the extant literature that has emerged on the association between sustainability and company/financial performance [
86,
87].
Figure 16 displays the topic distance map; the area within the circle characterizes the importance of each topic over the entire corpus, and the distance between the centers of the circles describes the similarity between the topics. The topic distance map was drawn using the built-in function of the pyLDAvis package to visualize the volume of topics and the keywords contained in the topic [
86,
87]. To the left, the different-size bubbles represent the topics; the larger the bubble, the more companies’ disclosures in that topic. The distance between the topics approximates the extent of the semantic relationship between the topics, and if the topics share common keywords, the bubbles will overlap (in proximity). The right part of
Figure 16 is a bar chart showing the relative importance of each term in topic 1. The red-shaded area describes the frequency of each term in each topic, while the blue bar shows the frequency distribution of the terms in all disclosures. Topic 1 is the largest topic in all the disclosures put together; the top 30 keywords in the right bar chart represent the most frequent keywords in this topic group. Studying the keywords in
Table 6 and
Figure 16, topic 1 seems to have disclosures related to greenhouse gas emissions and corresponding legislative action. Topic 1 also overlaps with topic 4 by having a few keywords. Topic 6 has a large distance from the other topics. This makes sense as topic 6 describes the financial impact aspect of sustainability. As mentioned previously, the related keywords describe issues different than the keywords of the other traditional sustainability-related topics. Topics 1, 2, 3, and 4 are in proximity and collectively describe environment-related topics, since they share the related keywords. Among the environmental topics, topic 1 focuses more on greenhouse gas emissions, and topic 2 pays more attention to carbon-related emissions affecting sustainability. Topic 3, however, shows a specific focus on climate change correlated with weather and risks. Topic 4 contains more natural-disaster-related keywords, while topic 5 deals with renewable energy technologies. Topic 6, of course, highlights the financial impact [
88,
89]. This topic is an interesting finding of this study.
Figure 17 is a highlight table showing the count of companies within different sectors that focus predominantly on specific topics. As seen, each sector has its own focus.
For example, manufacturing emphasized topic 2 (carbon risk) and topic 3 (climate change), while finance, insurance, and real estate focused on topic 4 (loss and damage). However, these are not mutually exclusive. Every sector is associated with each of the six topics. Caution must be paid attention to the total number of companies in each sector, as frequency counts are not being compared.