1. Introduction
With the increasing digitalization of our daily lives, more and more data are being generated, having their origin in very heterogeneous sources. A huge part of it is text written in natural language (e.g., social media), which brings in several complex challenges regarding extraction and synthesis for analysis (e.g., slang, sarcasm, multiple languages) [
1]. Such challenges also obviously apply to more complex texts, like the ones in research papers. As the number and heterogeneity of research papers increases worldwide, it becomes increasingly difficult to obtain a synthetic image of the topics being investigated. Dependability is an established, but also expanding and heterogeneous, field of research, covering a large and dynamic number of subfields and touching a huge variety of Information Technology dimensions [
2]. In addition, contributions to dependability are brought in by different research groups and, in multiple forms, which tends to make any analysis of the specific targeted subjects a complex problem. Thus, it is difficult to obtain a wide perspective image of exactly which topics of research are new, active, collapsing, or have been ephemeral.
In this work, we contribute with an analysis of the state of the art by using Natural Language Processing techniques and Topic Modeling, in particular the Latent Dirichlet Allocation (LDA) algorithm, to collect and synthesize information regarding topics of interest in well-known Dependability conferences. The main goal is to gain insight regarding topics of interest and understand how the field has developed in terms of the subjects of study, including active and decaying areas of research. The final analysis should also reflect an image of what has been achieved in this field, since its inception. Note that we do not intend to present a new model, but aim at using existing techniques in order to analyze a previously unexplored area.
Previous work has already analyzed different research fields [
3,
4,
5,
6]. However, to the best of our knowledge, such an application has not yet been done in the field of Dependability. It is difficult to obtain a clear overview of what Dependability covers, because the field has changed a lot over the years and it includes an increasing number of topics. In engineering, dependability is a branch of systems engineering. It describes the ability of a system to perform required functions under given conditions and thus can defensibly be trusted [
7]. Mainly, it encompasses four components: reliability, maintainability, availability, and safety. System users can reasonably trust a dependable system. We analyze papers published in six well-known dependability conferences, namely: The IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), the International Symposium on Software Reliability Engineering (ISSRE), the International Symposium on Reliable Distributed Systems (SRDS), the European Dependable Computing Conference (EDCC), the Latin-American Symposium on Dependable Computing (LADC), and the Pacific Rim International Symposium on Dependable Computing (PRDC), which we have selected for a more detailed analysis. We chose PRDC as a special focus because it is the only conference besides DSN for which all editions have been published on IEEEXplore and are therefore available. DSN is the main conference in the field of Dependability and makes up a huge part of our data set, so we could not imagine any major differences to the general overall analysis. Furthermore, PRDC is an emerging conference and the number of papers published has remained relatively constant over the years. We decided to analyze the conferences available on IEEEXplore because IEEE publishes the proceedings of most prestigious conferences in the field of dependability and also due to uniform access to data. DSN, ISSRE, and SRDS are all ranked A according to the CORE ranking (
http://portal.core.edu.au/conf-ranks/), PRDC is ranked B, and LADC is not ranked. The dataset includes 5004 papers that were published since 1988 and until 2019 (i.e., all of the papers that are available online in IEEE Xplore and may be parsed). We aim to answer the following questions for the whole set of conferences, and then particularly for PRDC:
- (RQ1)
which were the most important terms and topics discussed?
- (RQ2)
which terms and topics represent recent trends and are possibly important for the future?
- (RQ3)
how did specific terms of interest developed throughout the years?
The results show the presence of expectable terms and topics in the global set of conferences (e.g., ‘fault tolerance’, ‘fault injection’), although they also highlight the decreasing usage of certain terms, like ‘software reliability’. We also observed a strong presence of security-related terms, like ‘vulnerability’, despite the general focus of the conferences being dependability. PRDC shows clear similarities with the global set of conferences, although we also found a stronger presence of specific terms, like ‘cyber-physical’. We also observed a recent trend on terms related with artificial intelligence (e.g., ‘machine learning’ ‘anomaly detection’) and on blockchain systems.
In short, the main contributions of this work are the following:
the application of NLP techniques on a large dataset that includes titles, keywords and abstracts of research papers from the dependability field;
the identification of the overall most frequent trends in the analyzed conferences and in PRDC separately;
the identification of recent trends in the analyzed conferences and in PRDC separately; and,
an analysis of the development of research trends over the entire time period and its interpretation.
The remainder of this paper is structured, as follows. In
Section 2, we first provide some background knowledge, useful for understanding our methodology. In
Section 3, we discuss similar work related to our topic before we present our approach in
Section 4. In the following section, we present our results and the main findings. Finally,
Section 6 discusses the conclusions and future work.
2. Background
This section should help the reader to understand basic NLP techniques that are used in our work in order to better comprehend the following sections. Among the techniques that are important for the understanding are topic modeling, especially the Latent Dirichlet Allocation Algorithm and text preprocessing. Topic modeling is a text mining technique that uses both unsupervised and supervised statistical machine learning methods for uncovering latent themes in large collections of documents, by grouping the words into word clusters, referred to as topics [
8]. Text classification, on the other hand, is supervised, the classes must be defined in advance. In topic modeling, the possible topics are not known before. Topic Modeling recognizes which topics can be found in a document, whereas Text Classification only classifies the text in one class or another. In our case, we opted for Topic Modeling, because we did not know all of the trends that the analyzed papers might contain. Arguably, the most popular topic model is the Latent Dirichlet Allocation (LDA) [
9], which may be used for finding new content, reducing the dimension for representing unstructured text or classifying large amounts of text. In LDA, each document is considered to be a mixture of latent topics. Each word in the document is assigned to a topic. Every topic is considered a mixture of words. These topics, whose number is determined at the beginning, explain the common occurrence of words in documents. In newspaper articles, for example, the words “euro, bank, economy” or “politics, elections, parliament” often appear together. These sets of words then each have a high probability in a topic. Words can also have a high probability in several topics. As explained by Blei et al. in [
9], for each document
d in a corpus
D, LDA assumes the following generative steps: (1) choose length of document
N; (2) choose topic proportion
; (3) for each of the
N words: (a) choose a topic
z, (b) choose a word
w from
p, a multinomial probability conditioned on the topic
z.
Figure 1 depicts the plate notation of the LDA algorithm. A plate notation is used in order to graphically depict variables that are repeating. The input parameters are
and
that represent the per-document topic distribution and the per-topic word distribution, respectively. The output are the words in
W. The boxes are plates that represent repetitions. According to the parameter
, the word distribution
for each topic in
K is being determined. Parameter
influences the topic distribution
for each document in
M. The documents that are to be analyzed are depicted by the outer plate
M, the word positions in a particular document are represented by the inner plate
N. Each of these positions is associated with a selection of topics and terms. Based on
, the topic assignment
Z is being calculated for each specific term in
W. The only observable variables are those words in
W, which is why
W is shown in grey. LDA only gives as output the terms that describe the different topics. All other variables are latent, which means that they are present, but not directly observed.
To maximize the benefits of text analytics, including LDA, text preprocessing is common. Sarkar [
10] points out the importance of this step, which should include cleaning the whole dataset towards results that are easier to analyze (e.g., without meaningless words, spelling mistakes, etc.). One of the initial steps is tokenization, which splits text into tokens, its smaller meaningful units, often words. Yet, in order to better understand the meaning of a document, it might be useful to go beyond single tokens and analyze n-grams, which are sequences of
n consecutive tokens. While unigrams are single tokens (
),
n can be increased for longer sequences, such as bigrams (
), e.g., ‘machine learning’, or trigrams (
), e.g., ‘natural language processing’.
It is common to remove stop words, i.e., words that do not contribute enough to the meaning of a document. These are often functional words, like pronouns, articles, or prepositions, e.g., ‘you’, ‘the’, ‘at’. Preprocessing may further include lemmatization, which groups different forms of the same word in their lemma form, which carries the word meaning. Common rules include changing plurals to singulars (e.g., both ‘computer’ and ‘computers’ become ‘computer’) and using the infinite form of verbs (e.g., ‘write’, ‘written’ and ’wrote’ become ’write’). Stemming is an alternative to lemmatization, which represents words by their stem. Nevertheless, the stem is not always a valid word and it may not be possible to group irregular words (e.g., ‘wrote’ may become ‘wrot’ but ‘written’ ‘writ’). In some cases, additional steps may be taken, e.g., for expanding contractions, correcting spelling mistakes, or eliminating repeated characters.
4. Methodology
In this section, we present our approach to analyze topic trends in well-established dependability conferences. In practice, we are interested in understanding which dependability topics are new or active, decaying, or have been most prominent. Our analysis goes through the following steps:
- (1)
data acquisition and preparation;
- (2)
exploratory analysis of frequent author keywords for each year;
- (3)
topic Modelling with LDA; and,
- (4)
analysis of frequent terms in LDA topics for each year.
We began with the acquisition and preparation of the data, where we aimed at collecting information regarding the following well-known dependability conferences:
International Conference on Dependable Systems and Networks (DSN formerly FCTS);
European Dependable Computing Conference (EDCC);
International Symposium on Software Reliability Engineering (ISSRE);
International Symposium on Reliable Distributed Systems (SRDS formerly RELDIS and RELDI);
Latin-American Symposium on Dependable Computing (LADC); and,
Pacific Rim International Symposium on Dependable Computing (PRDC).
The IEEE Xplore website [
23] was our data source. Because it gathers the proceedings of the majority of the editions of the above mentioned conferences, we scraped it to extract metadata.
Table 1 details our dataset, including the number of papers published per year and conference.
The dataset holds metadata regarding a total of 5004 papers, published between the year 1988, the first edition of DSN (at the time, known as FCTS) and 2019. Most of the conferences were established later than 1988 and some did not have an edition every single year. Additionally, fourteen editions of these conferences were not included because their proceedings were published by a different publisher and were, thus, not available from IEEE Explore, namely: four editions of EDCC (1994, 1999, 2002, 2005), three editions of LADC (2003, 2005, 2007), ISSRE 1990, and the six editions of SRDS before 1988.
For each paper identified, we used JSOUP (
https://jsoup.org/), a web scraper tool, for extracting the Digital Object Identified (DOI), title, abstract, and also “keywords”, which IEEE Xplore separates in the following four types: author keywords (i.e., assigned by the authors); IEEE keywords that come from the IEEE taxonomy (
https://www.ieee.org/content/dam/ieee-org/ieee/web/org/pubs/taxonomy_v101.pdf), which includes different scientific fields with a maximum of three sublevels; INSPEC controlled indexing keywords (i.e., listed in the Inspec Thesaurus) (
http://images.webofknowledge.com/WOKRS59B4/help/INSPEC/hs_controlled_index.html); and, INSPEC non-controlled indexing keywords (i.e., assigned by INSPEC indexers, but in free-language words and phrases). It is important to mention that we eliminated extraneous documents that were found in the proceedings from the acquisition, namely those whose title contained the following words: committee, organizers, organization, abstracts, forum, subcommittee, message, workshop, keynotes, publisher, sponsors, acknowledgments, and symposium. This elimination was necessary in order to avoid documents in the dataset that are not actually research papers.
Before running LDA, we performed an exploratory analysis of common author keywords in the dataset. Author keywords are easy to acquire and very meaningful, because they were set by the authors themselves, even though they are, unfortunately, only available for papers since 2004. We decided not to put a special focus on the INSPEC and the IEEE keywords as they proved less meaningful than the author keywords (e.g., ‘proposals’, ‘hardware’, ‘frequency’), thus their analysis would lead to less particular insights. We mapped each different keyword present in the dataset to the respective total number of occurrences in the whole dataset. We also retrieved the count per year and per conference, for further analysis.
It is worthwhile mentioning that we also tried to identify meaningful terms by examining the most common terms (i.e., unigrams and bigrams) that were retrieved from the titles and the abstracts, but the results were not very helpful. For example, the most frequent unigrams in titles and abstracts were: ‘software’, ‘system’, ‘fault’, ‘data’, ‘network’, ‘model’ and ‘reliability’. The most frequent bigrams found were, for example, ‘large scale’, ‘real time’, ‘computer science’, or ‘case study’, which we found to be very generic, making it difficult to obtain helpful insights.
In order to identify the research topics in the identified papers, we then run the LDA algorithm in the text of the abstracts and titles (as a whole) per year. As we mention in the Background section, LDA is the most popular algorithm for topic modeling. We were particularly interested in understanding the topics of relevance per year (and across all conferences). LDA is based on a repeated random selection of text segments, whereby the statistical accumulation of word groups is recorded within these segments. Thus, the algorithm calculates the topics of the text collection, words that belong to each topic, and their salience. LDA offers better quality and coherence than other available topic modeling algorithms, as shown by Saari [
4].
Yet, before actually running LDA, we pre-processed the input data, which is especially important when running this algorithm [
10]. For this, we used the Natural Language Toolkit (NLTK) [
24] and applied the preprocessing techniques (see Background section) that we deemed to be appropriate for our case, also considering related work [
10]. First, we tokenized all the texts and converted the tokens to lowercase. We removed numbers but not words containing numbers to not exclude Numeronyms (e.g., ‘industry4.0’, ‘3G’, ‘AIS256’), as they may carry relevant information. We also removed all words that are one single character long, as they are not useful. We then lemmatized all the words. After this, we used the gensim [
25] Python library in order to compute bigrams for each paper (i.e., for each abstract and title), which becomes associated with both its respective single-word terms and combined tokens (e.g., if the text contains ‘[…] software reliability […]’ the output will contain the terms ‘software’, ‘reliability’ and ‘software_reliability’). Bigrams tend to be more specific and, in that sense, they are more informative (e.g., ‘software reliability’ is more informative than ‘sofware’). Additionally, some common unigrams are very hard to interpret when not associated with a second word (e.g., ‘base’, ‘core’, ‘case’, ‘solution’). We only considered bigrams that appear at least five times, as we observed that a lower number tends to produce meaningless combinations of terms (e.g., ‘sense_to’, ‘look_reveals’).
An important step after adding bigrams is removing stop words. Besides common English stop words, available in NLTK, we added a number of selected stop words that we found to be meaningless in our context, as they are not specific to the dependability domain (e.g., ‘year’, ‘conference’, ‘author, ‘approach’). Some other techniques were not necessary in our case, for instance, expanding contractions, correcting spelling mistakes, or eliminating repeated characters, because colloquial language and spelling mistakes are rather uncommon in research papers. In order to normalize words, we opted for lemmatization instead of stemming, because the results of the latter are harder to interpret, especially with bigrams.
Regarding the configuration of the LDA model, we performed an exploratory analysis of several different parameters, before we committed ourselves to the following. In LDA, the corpus is always represented as a bag-of-words of the documents, which means that the order of the words does not matter. Even though we experimented with a smaller and larger number of topics, we decided to generate 20 topics, both because different numbers seemed to produce less informative topics and because 20 is a commonly used number for this purpose [
8]. There are no precise instructions on how to set the value K optimally for the topics to be generated, since LDA is an unsupervised method. Various researchers have developed approaches for determining K in the best possible way [
6,
26]. However, other research recommends not to rely on these methods, but to check the quality of the topics with different numbers [
27].
We opted to represent each topic with 20 words, because this allows for a more detailed characterization of the topic. We empirically noted that a low number of words tends to make topic labeling harder, while a high number tends to add irrelevant terms. We set the number of passes to ten, which means that the model is trained ten times on the entire corpus. This number needs to be high enough for all of the documents to converge enough, so that there are not too many diverse topics (in the worst case one topic for each document). On the other hand, if the number of passes is too high, we would get twenty very similar topics because they would become increasingly convergent. As output, we obtain a list of topics sorted in descending order from the most frequent topics to the less assigned topics.
After this, we wanted to analyze the terms in the LDA-generated topics. In total, we considered a timespan of 32 years, which, if we multiply by 20 topics, equals 640 topics. As each topic consists of 20 terms (i.e., 20 words), we end up with 12,800 words, including possible duplications. Note that, due to the nature of the LDA algorithm, which aims at uncovering the most meaningful terms for the analyzed documents, we were expecting to obtain a very different set of terms than the author keywords, which makes them interesting to analyze. We then selected the most frequent unigrams, namely those with at least ten occurrences, when considering all of the terms from all generated topics (i.e., all the years).We also selected the most frequent bigrams (at least two occurrences), but we did this separately, as we observed that they would rarely appear in the list of all the most frequent terms. Nevertheless, we still had to eliminate some less informative terms manually, e.g., ‘number_of’, ‘paper_we’, ‘such_a’, ‘show_that’.
Finally, we relied on the outcome of this process for analysing how the popularity of the key terms that were generated by LDA changed over time. In particular, we were interested in analyzing the frequency of LDA terms (also author keywords) over the whole timespan considered (i.e., the 32 years), but also focusing on the last three years, with the goal of identifying recent trends.
5. Results
In this section, we present the results of the analysis performed according to the methodology and considering two main views: (a) global view of the whole dataset; and, (b) particular view of the PRDC conference. The analysis goes through the following parts:
5.1. All Conferences Analysis
In this section, we analyzed the papers that were gathered from all six conferences. The analyzed time frame is from 1988 to 2019, i.e., 32 years. Some conferences published more papers than others, e.g., DSN or ISSRE. This of course also means that these conferences have a stronger influence on the analysis. This chapter’s goal is to give a general overview of the dependability field regardless of the conferences.
We start with the identification of common keywords, when considering all conferences.
Table 2 shows the top ten most frequent terms, used as author keywords in all conferences and the top fifteen most frequent terms in the last three years of the remaining words.
As we can see in
Table 2, regarding the whole analysis period and, despite the focus of the conferences being dependability, there is a large number of papers using ‘security’ as a top keyword. The results also show the presence of expectable keywords, like ‘fault injection’, ‘fault tolerance’, or ‘distributed systems’. It is worthwile noting the presence of ‘cloud computing’ (as a domain of application) and also ‘machine learning’, which is likely related with the known recent trend in this area [
28]. If we only consider the last three years, the results mostly confirm the common sense that suggests there is additional focus in topics, like blockchain systems, Internet of Things, and privacy.
In
Table 3, we present the top ten topics identified by LDA out of twenty produced by the LDA algorithm for all papers present in our dataset. The topics have been ordered by LDA, according to the number of papers they describe and we also show the specific terms composing each topic (limited to the top 10 terms, for space reasons). LDA orders these terms by their salience, which indicates the importance of a certain term to the topic. It is difficult to describe each topic in a title as the topic looses meaning by reducing it to one single term. When we tried to describe the topics, there was repetitions, so that we decided to leave them without titles.
Based on the LDA topics, we separately identified the most popular unigrams and bigrams, i.e., those appearing more frequently in the whole set of generated topics.
Table 4 presents the results of this identification. Even though the frequencies for this table were counted on the twenty topics for each single year. We can see a clear connection between this table and the topics that were generated for the papers for all years shown in
Table 3. For example, the term ’performance’ is the most frequent term for the whole period when considering the topics for each year and it is also the most frequent in our presented LDA topics for all years.
We found the unigram list to be quite generic (e.g., ‘performance’ may have different meanings, depending on the context) and, as such, we found the bigram information much more informative. In this latter case, we find the top three terms clearly exposing the nature of the conferences (i.e., ‘fault tolerance’, ‘software reliability’, ‘fault injection’), with the remaining showing areas of application (e.g., ‘distributed system’, ‘operating system’) and terms that are likely to be related with the evaluation of proposals (e.g., ‘test case’, ‘case study’, ‘large scale’). We also note the presence of terms, like ‘soft error’ and ‘safety critical’, which further characterize the work being published.
The analysis of the last three years unigrams again yields little useful information, although we may highlight ‘cloud’, ‘bug’, and ‘safety’. However, if we analyze the last three years bigrams, then it is worthwhile mentioning the obvious presence of ‘machine learning’, ‘smart contract’ (i.e., blockchain code), and also ‘anomaly detection’ (which is very much related with machine learning). The remaining terms are quite diverse, but we highlight ‘static analysis’, which would not be obvious presences in this top recent bigrams.
There are several terms that have a simultaneous presence in the LDA topics and keywords, which emphasize the presence of certain subjects of interest. When considering the whole period, this is the case of ‘security’, ‘fault tolerance’, ‘fault injection’, and ‘distributed systems’. If we look at the last three years, we also find ‘safety’ and ‘static analysis’ being used in keywords and appearing in popular LDA terms. Then, we find a few more terms that do not exactly map (as words), but are semantically related. This is the case of ‘cloud computing’ and ‘cloud’, and also ‘smart contract’ and ‘blockchain’. It is also worthwhile mentioning that we found three terms that have presence as keywords in the overall period, but are found to be popular only in the LDA terms for the last three years. This is the case of ‘cloud computing’, ‘anomaly detection’, ‘machine learning’, with LDA marked as being particularly interesting for the recent years.
After analysing popular terms, we handpicked a few terms of interest and analyzed their popularity across the whole period, i.e., from the beginning of the conferences.
Figure 2 and
Figure 3 show the selected unigrams and bigrams and their relative popularity across time. We excluded the years 1988–1990 as our handpicked terms only rarely or not at all appeared in those years, due to a very small number of papers in the first years analyzed.
Figure 2 very clearly shows that the use of the term ‘dependability’ is losing expression and this is very likely due to the fact that, nowadays, although the focus is obviously still dependability, the aspects being researched tend to be finer and much more specific, whereas in the early years of the conferences dependability in itself, as a whole, was a sufficiently novel and detailed term to be used in abstracts. There is a noticeable presence of ‘bug’ until recently, with the same happening with ‘safety’ and ‘security’. The term ‘vulnerability’ accompanies ‘security’, although with a lower relative frequency. More recently, we find ‘resilience’ (with a relatively low frequency) and ‘cloud’, associated with the advent of cloud computing. Additionally clear is the recent presence of ‘blockchain’, which is in line with its known recent popularity.
As
Figure 3 shows, we find ‘fault injection’ as currently the most popular term and we see that, despite some variation, its popularity followed a global positive trend. It is interesting to note the strong decrease of the use of ‘software reliability’ as terms of relevance in the abstract. Similarly, there is a trend with the term ‘web services’ initiating in 2003 but closing in 2016. At the same time, we see ‘operating system’ with a fairly equal presence throughout the years, with the same happening with ‘soft error’ although it has been used only from the beginning of the 2000’s. ‘Distributed system’ and ‘fault tolerance’ tend to be used and are popular, although their relative frequency has decreased. Recent and rising terms are ‘machine learning’ and ‘anomaly detection’, which clearly show the interest of the community in artificial intelligence topics and the confluence of dependability with these fields.
5.2. PRDC Conference Analysis
We now take a closer look at the IEEE Pacific Rim International Symposium on Dependable Computing (PRDC) and begin with the identification of common keywords for PRDC, which we summarize in
Table 5.
Table 6 presents the top ten topics identified by LDA for PRDC, and their first ten associated terms. We can see that the topic around security-related terms appears in both the topics of all the conferences and only for PRDC in one of the first places. Topic number 7 in PRDC could be related to networks, also topic number 6 in the overall analysis probably describes papers that are connected to networks. The term ‘dependability’ appears as first term in the first topic for PRDC, which clearly shows its importance to the conference, whereas we can find the same term in only topic number 10 for all of the papers in the dataset.
As previously noted, we calculated the most popular unigrams and bigrams present in the LDA topics, which are shown in
Table 7.
Once more, the unigrams revealed again to be little informative, with the exception of a few terms, like ‘security’ or ‘safety’. Still, it is interesting to notice network-related terms at the top two positions. In the bigrams, we find that half of the top ten terms match those that we had previously observed for the whole dataset, namely ‘fault tolerance’, ‘fault injection’, ‘software reliability’, ‘soft error’, and ‘safety critical’. We also find ‘fault detection’ and ‘error detection’ to be strongly associated with PRDC, and also ‘web service’, which is likely the consequence of the time period and number of editions of PRDC being smaller.
When considering the last three years, we may emphasize ‘attack’, ‘cloud’, ‘iot’, ‘blockchain’, or ‘privacy’, which are the most descriptive unigrams in the list. The bigrams show the interesting case of ‘cyber physical’ being the top term (not found in the top terms for the whole dataset). We then find the expected case of ‘machine learning’, and then ‘anomaly detection’ and also cloud related terms like ‘cloud computing’. The frequency of the top six terms is much larger than the remaining six, which, in this sense, should be considered to be less informative.
Finally, we handpicked a few terms of interest from PRDC and analyzed their popularity for the time period of this conference.
Figure 4 and
Figure 5 hold a visual representation of the relative prevalence of these terms over time.
Regarding
Figure 4, there are a few cases of interest, namely the relatively strong presence of ‘ip’ and also ‘net’ throughout the time. The topics ‘safety’ and ‘security’ have gained popularity over time. ‘Dependability’ used to be a frequent term in the early years of the conference, but it is now much less important. The presence of the term ‘vulnerability’ is also noticeable, although its popularity globally dropped over time. The term ‘bug’ closely follows the pattern of ‘vulnerability’. The presence of resilience is scarce throughout and, among the selected terms, ‘cloud’ has seen a boost since 2009 and ‘blockchain’ also, but since the last three years. Similar to the whole dataset, ‘availability’ lost interest over the years.
Figure 5 holds the bigrams’ popularity representation. We must emphasize the general presence of ‘fault tolerance’ and ‘fault injection’ as frequent terms. Additionally, we noticed the case of ephemeral terms, namely, ‘smart grid’, ‘web service’, or ‘failure detector’. ‘Machine learning’ and ‘anomaly detection’ show positive trends, in line with what was observed for the whole dataset. Finally, we found ‘operating system’ to not have a presence as strong as was obtained in the whole dataset.
6. Conclusions and Future Work
In this work, we analyzed 5005 research papers from six different well-established dependability conferences while using the Latent Dirichlet Allocation (LDA) algorithm. It was possible for us to identify the most important topics of the analyzed conferences and to interpret some of them. Over the years, the Dependability Conferences have adapted to the prevailing IT trends. The number of papers published has increased with time. The most popular topics in the analyzed conferences were Fault Tolerance, Software Reliability, Fault Injection, but also Security. Contemporary trends include machine learning, anomaly detection, Internet of Things, blockchain systems, and especially clouds. We took a closer look at the PRDC conference, which showed a few resemblances with respect to the whole dataset (e.g., ‘safety’ and ‘security’ are generally present, ‘machine learning’ is a recent trend), but also a few clear differences (e.g., ‘operating system’ is a less frequent term, ‘cyber physical’ is associated with PRDC as the top term, but it is not on the top ten for the whole dataset). We could observe that over time the classical topics merge with current topics, such as artificial intelligence, and that a greater diversity of topics has developed over the years. The conferences are becoming more diverse and also deal with topics that would not necessarily be classified as belonging to the Dependability field in the classical sense. The current trends in the field of Dependability largely coincide with the prevailing trends in computer science in general.
However, there are a few threats to the validity of this work. A first clear point is the lack of editions of the conferences by publishers other than IEEE (i.e., 14 editions missing in total). Even so, the dataset includes the large majority of the works. A related aspect is with the number of conferences, which could be extended, although we believe these are considered the most strongly connected with dependability. We must mention that several operations had to be done manually, such as selecting stop words and terms for elimination. The LDA algorithm chooses the most meaningful terms of a certain input, but these may not always be the terms that are the most meaningful for the domain. Without this manual elimination the analysis would loose a lot of its meaning, but at the same time it could add bias to the results. This manual elimination was checked by a second researcher, in order to assure no particularly important information would be lost as a direct consequence of this step.
We analyzed selected terms, which may add bias to the discussion. Despite this, the selection was made with agreement by two researchers, as a way of reducing any individual bias. Finally, we have chosen to only analyze the author keywords, whereas the other types of keywords could also provide relevant information about the development of research trends. Again, this decision was taken after analyzing the different types of keywords, and the decision was taken over the set we found to be more descriptive.
Our work helps to provide a clearer view of the research being published in well-known dependability conferences. It may help researchers in selecting research topics of interest and clearing decaying topics.
As future work, we intend to predict future research trends by training different models with the data. We also intend to explore the meaning of the terms by using deep semantic representations, which allow for bringing further information and improving understanding regarding research trends.