1. Introduction
Bibliometrics, as a science-related discipline, aims to provide a set of tools for the assessment of scientific production. From its origin at the beginning of the 20th century to the present day, bibliometric studies have focused on different points of view. In 1917 Cole and Eales carried out the first bibliometric study through the statistical analysis of publications on comparative anatomy [
1], thus initiating the use of bibliometrics for the measurement of scientific activity. Following this same approach, in 1926 Lotka focused his work on analyzing the scientific production of researchers with the so-called Lotka’s Law of Productivity, a law that determines that the greatest number of authors publish the least number of publications, while the least number of authors publish the greatest number of publications [
2]. In 1956, Price formulated the Law of Exponential Growth of Scientific Information, stating that it grows at a much faster rate than other social processes. Price also states that the scientific literature loses relevance more rapidly, although not in a uniform manner depending on the different disciplines. Thus, while in the experimental sciences and technology the growth in number of publications is greater and faster, their decline is more rapid, in contrast to the behavior found in the humanities and social sciences. Later, it was in 1963 when Price introduced a new element in the development of bibliometrics by relating the growth of science to scientific communication [
3].
A second aspect of bibliometrics is oriented to the analysis of the publications’ references in the scientific literature. Thus, in 1927 Gross and Gross made the first count of references appearing in the Journal of the American Chemical Society to study the frequency of their appearance and the sources of their origin, applying the study to the selection of the list of subscriptions of interest [
4]. In 1934 Bradford analyzed the distribution of articles in journals by formulating Bradford’s Law of Dispersion, according to which it was evident that a small number of journals accounted for the largest percentage of the bibliography of a specific topic [
5]. If scientific journals are arranged in decreasing order of productivity of articles on a given subject, one can distinguish a core of journals more specialized in that subject and several groups containing approximately the same core but distributed in an increasing number of journals. It can be understood as the background of the classification of journals by scientific categories.
The third point of view focuses on the analysis of the impact and visibility of research through citation activity. As early as 1873 Shepard developed a citation index following the codification applied to federal court judgments in the United States. However, it was not until 1936 that Cason and Lubotky created for the first time a citation network, identifying the links between psychology journals [
6]. However, undoubtedly, the precursor of citation analysis is Garfield, who published in 1955 in the
Science journal the proposal for a citation index [
7], based on Sherpad’s concept, which made it possible to relate an article to other articles citing it. In this way it was possible to assess the significance of a research paper and its impact, and for researchers to know how their publications were being used. This is the renowned Science Citation Index (SCI) created by Garfield himself from the ISI (Institute for Scientific Information). In the early 1960s, Garfield and Sher designed the Impact Factor.
The purpose of the Impact Factor was to be the methodological instrument for selecting the journals that belong to the Science Citation Index, since it was unfeasible to include all the existing scientific journals in it. Years later, in addition to the Science Citation Index (focused on Experimental and Technological Sciences), it created the Social Science Citation Index (oriented to the Social Sciences) and the Arts and Humanities Citation Index (AHCI) for the Arts and Humanities. These three databases have been a milestone in bibliometrics and have become benchmarks in the evaluation of publications, researchers, and institutions. They are part of the Web of Science database platform, originally known as ISI Web of Knowledge and currently owned by Clarivate Analytics.
Although they have been the main benchmark since the 1960s, based also on the relationship that Garfield established in 1979 between the nature of the research and its potential to be cited, they have nevertheless been the focus of multiple criticisms [
8]. Earlier in 1976 Pinski and Narin warned of the bias in favor of reviews, which tend to have a higher impact factor and in the calculation of the impact factor all citations are weighted equally [
9]. To correct this deviation, they suggest the “influence methodology”, giving each journal a weight regardless of its size. As early as 1986 Tomer thought that “There is no distinction in regard to the nature and merits of the citing journals” [
10]. These disagreements have been ongoing for a long time, and they are still relevant today.
For example, in 2001 Tijssen, Visser and Van Leeuwen questioned citation analysis as a measure of research quality since the influence of citation varies in different disciplines, showing considerable differences [
11]. Today, shortcomings such as asymmetry between numerator and denominator, differences between disciplines, insufficient citation window and asymmetry of underlying citation distributions has also been analyzed by Larivière and Sugimoto in 2019 [
12].
The JCR Impact Factor (SCI, SSCI) is not the only metric that measures the impact factor. The SJR (Scimago Journal Rank), shows the visibility of the journals contained in Scopus since 1996. This metric applies not only to journals, but also to book series and conference proceedings. Based on citations, it shows the quality and reputation of the journal in thematic fields, computing the citations received to articles of a journal for a period of three years, giving a greater weight to citations coming from high reputed journals. The SJR index attempts to correct for these deviations by weighting links based on citation proximity, extending the number of years considered in the citation and setting thresholds for self-citation within the journal itself [
13].
By the end of 2016 [
14], Scopus establishes a new metric index, the CiteScore, which extends the range of citation years (4 years), but by including all types of documents; on the one hand, it eliminates the differences between the different types of documents, although on the other hand some critics state that this index benefits Elsevier publications, which tend to publish a lower proportion of articles than other publishers [
15].
Additionally, as a last novelty, there is the transition of the impact factor computation with respect to the date of online publication and not the date of print publication, as until now. In the current system, there are journals that have up to more than a year to publish the article online so that it can obtain citations, and when it is published in print, its number of citations is higher than those of other journals. Therefore, there is a trend towards a model in which the online publication date will be considered for the computation of the Journal Impact Factor (JIF) [
16].
This change implies a problem for databases that do not have an online publication date. Web of Science Core Collection has begun to index online-first articles since December 2017 [
17]. For example, in the case of Web of Science, half of the journals indexed lack this data [
16]. If a publication is published online in the same year as in print, there is no mismatch since the JIF is from the same year. This is not the case for journals published online in one year and in print in another. Clarivate is considering the effects of adopting two new counting models: one pre-2020 and one post-2020 [
18].
Thus far, bibliometrics has progressed from its origins to the present day. At present, there is a significant increase in the number of publications on this discipline, closely linked to the exponential growth of science. This trend has been classified into three major approaches [
19]:
Bibliometric performance studies on authorship and production: they focus on analyzing the profiles of authors according to elements such as their affiliation, country, and the production of articles, examining which are the most cited or relevant;
Bibliometric studies on topics: they focus on the main topics dealt with, as well as their relationships or evolution in a specific topic;
Studies on research methodologies: they focus on the research methods and techniques used to develop the research papers published in the journals.
Taking all these approaches into account, how can bibliometrics be defined? From a quantitative point of view Pritchard in 1969 describes it as “studies aimed at quantifying the processes of written communication” [
20]. In 1987, Broadus defined bibliometrics as the “branch of research concerned with the quantification of the physical units of publications, bibliographic citations and their surrogates” [
21]. A broader concept is included here since it establishes relationships between publications and bibliographic links or co-citation. Moed in 1989 defines it as the “discipline that deals with the collection, processing and management of bibliographic data from the scientific literature” [
22]. From this second point of view, bibliometrics has been defined as a tool for analysis and evaluation. In 1989 White and McCain defined it as “the quantitative study of publications as reflected in the literature, in order to provide evolutionary models of science, technology and research” [
23]. Spinak in 1996 refers to bibliometrics as the study of the organization of scientific and technological sectors from bibliographic sources and patents, to identify authors, their relationships and trends [
24]. In the same line, other authors describe bibliometrics as the discipline that tries to measure scientific and social activity and predict its trend by analyzing the literature [
25].
Other concepts related to bibliometrics are scientometric or infometric. Scientometric applies bibliometric techniques to science and examines scientific development and policies. Infometric is more focused on quantitative aspects of measurement and the application of mathematical models.
Bibliometrics and bibliometric indexes form a whole that serve to assess and measure scientific production in all its aspects. To measure, it is necessary to evaluate a set of data that are collected in databases specialized in giving visibility to scientific publications. A bibliometric index is a parameter that measures some aspect of scientific activity and allows for assessing the impact of research in the different fields of science. The two databases that allow this analysis are Web of Science and Scopus, both with a clearly commercial bias. Based on these two databases, both Clarivate and Elsevier have developed applications that allow organizations to assess their research from different perspectives to be able to establish and evaluate strategies based on reliable data.
InCites [
26] uses data from the Web of Science Core Collection since 1980 to facilitate the analysis of organizations: activity, impact, collaborations, allowing to make comparisons. It allows searching by researchers or research groups to analyze their production. The search by areas of knowledge gives an overview of emerging fields. It is also possible to analyze the journals in which they are published and the funding agencies. All these variables (affiliation, researcher, area, source of publication, funding) can be easily combined to perform analyses by applying and combining different metrics (productivity, impact, collaboration, open access) and generate all kinds of reports. As a novelty, since December 2020, InCites allows the analysis of topics, classifying them into macro, meso and micro topics thanks to the collaboration between ISI and Centre for Science and Technology Studies (CWTS) and the use of the algorithm developed by CWTS that allows to detect and connect communities [
27].
Based on the analysis of data from Scopus [
28], Scival offers access to more than 50 million publication records (post-1996) from over 22,000 journals from more than 5000 publishers worldwide. It analyzes the scientific output of more than 230 countries and 14,000 institutions allowing to visualize research performance, make comparisons, analyze trends, and evaluate collaborations. It also allows the analysis of topics, classifying them into topic name and topic cluster. As InCites, Scival allows to generate data analysis and visualization reports combining many metrics that assess economic impact, productivity, citation impact, usage, collaborations and communication.
There are a large number of bibliometric metrics that allow the evaluation of scientific activity, but it is important to use these metrics correctly. It is necessary to consider what is to be measured, apply the appropriate metric, detect possible deviations, make an adequate analysis, etc. In this regard the 2015 Leiden Manifesto sets out 10 basic principles that the use of metrics should not be forgotten [
29], and the San Francisco Declaration on Research Assessment sets out 18 recommendations in the same direction [
30].
The first goal of this research is to analyze the context of all the bibliometric studies carried out from 1996 to 2020 to discover if there is any bias towards any scientific category, if there are countries or institutions that devote a great effort to this issue and finally to analyze what consideration these works have, e.g., are they mostly considered as reviews or articles, and what level of citations they have in comparison according to the categories in which they are indexed. As a second main goal, it is the case study of the categories of medicine and environmental sciences.
2. Materials and Methods
This analysis was based on searches of the Scopus and Web of Science databases. A previous study has pointed out that WoS is a confusing concept, as many institutions may subscribe to only a customized subset of the entire Web of Science Core Collection. It should be made clarified that our study is conducted for the whole of WoS [
31]. Although the historical content of Scopus dates to 1788, the search was limited from 1996 (when the analysis of Scopus data in SciVal began) to 2020. In the case of Web of Science, the origin of the data collected in this database begins in 1960 and the analyses in InCites begin in 1980. In order to carry a correlation in the results presented in this work, it has also been limited from 1996 to 2020.
The search was performed using the same criteria: the term “bibliometric” in the title of the publication and in the keywords assigned by the author. The results of both searches were exported from Scopus to SciVal Benchmarking and from WoS to InCites Analyze.
Data processing, both from Scopus and WoS and from SciVal and InCites, was carried out with different tools. The Scopus API was used for automatic data retrieval [
32], Microsoft Excel, Gephi and ArcGIS for the analysis and representation of the results, see
Figure 1.
Topic classification is done on the document [
33]. A topic in SciVal covers a collection of documents with a common intellectual interest [
34]. Over time, new topics appear and, as topics are dynamic, they evolve. Each document is assigned a topic consisting of three elements, for example: Intellectual Structure, Co-citation Analysis, scientometrics. The topics are based on the citation network grouping of 95% of the Scopus content (all documents published since 1996), taking as a reference the direct analysis of citations using the reference lists of the documents. As new published documents are indexed, they are added to Topics using their reference lists. This makes the Topics dynamic and most increase in size over time. New topics represent research areas that have experienced a significant acceleration of growth in recently published articles and have attracted funding. These new Topics are derived from the existing stem Topics and are formed by the new citation relationships that have occurred in the last year. Once a year, the Topics SciVal algorithm is run to identify the new Topics that have emerged [
35].
Like SciVal Topics, the InCites Topics ranking is also done on the document. It is based on a CWTS algorithm [
27] considering the citations (cited and citing) between documents, based on the “strength” of the citation relationships. In this way, clusters are created: macro, meso and micro topics.
An independent analysis, based on scientific communities or clusters and the relationships between them based on citation and main keywords, has also been considered in this research.
Finally, continuing with the issue of quality, the sources (journals) have been analyzed with the following metrics:
On the other hand, the analysis of the sources has been completed with two other metric values:
Field-Weighted Citation Impact (FWCI) the SciVal [
38];
Category Normalized Citation Impact (CNCI) the InCites [
36].
6. Conclusions
This study has analyzed the bibliometric documents produced between 1996 and 2020. It has been observed how bibliometrics were applied to research in all scientific fields during these years. To evaluate these documents, a methodology has been used that has proven to be valid to relate scientific production in Scopus and WoS and link it to bibliometric indicators through SciVal and InCites.
The first conclusion drawn from this work is that there is an exponential growth in publications between 2000 and 2020 and that most of the documents are indexed as articles (72% in Scopus and 68% in WoS), as opposed to reviews (13% in Scopus and 14% in WoS). Three countries have led the number of documents published: China with 16%, the USA with 15% and in third place Spain with 12.5%. In this sense, it is worth highlighting the role of Spain in third place compared to the two large countries with the highest scientific production in absolute terms.
From the point of view of the institutions, there are differences between the two databases analyzed. However, the top five positions in the ranking are shared by the same institutions: University of Granada, University of Valencia, Consejo Superior de Investigaciones Científicas (CSIC), Chinese Academy of Sciences and Leiden University. Once again, the predominance of Spanish institutions in this ranking stands out. International collaboration is undoubtedly a parameter that allows us to know the synergies in scientific production. In this case it has been shown that the institutions located in the top five positions of the ranking do not have a parallelism between quantity of production and international collaboration, they have 30% of international collaboration, that is to say, they have collaboration below the average, which without these institutions is 45%.
Regarding the topics where bibliometrics is applied, the publications have been categorized, and despite the differences between Scopus and WoS when classifying the publications, the results show that this type of studies have been classified mainly in the areas most related to bibliometrics. According to Scopus, in order of importance: Social Science and Computer Sciences, Medicine, Business, Management and Accounting, Engineering and Environmental Science. According to WoS: Information Science and Library Science, Computer Science, Environmental Sciences and Management. There is a high degree of interest in the application of bibliometrics to other disciplines as an element of analysis of their own progress.
Completing the review of the topics, the topics for Scopus indexing have been considered as an indicator of where the publications on bibliometrics stand out. In this sense, the trend also shows the predominance of topics related to the discipline addressed in this research. Hirsch Index, Self-Citation and Journal Impact Factor as predominant Topic Name in SciVal. Publications, Periodicals as Topic, Research as predominant Topic Cluster Names. Interestingly, the ones with the most citations per document are for the Topic Name, Social Science and Humanities, Research Evaluation and Book Publishers with 45 citations per document as average; and for the Topic Cluster Name, Decision Making, Fuzzy Sets, Models with 23 Cites per Document.
In InCites they are mostly included in the Macro Topic of Social Sciences with an average of 14 citations per document, in the Meso Topic of bibliometrics, scientometrics and Research Integrity, but with respect to citations per document the meso topic of Artificial Intelligence and Machine Learning stands out (19 C/D). In the Micro Topic, the main one by number of documents is bibliometrics, but regarding citations per document Fuzzy Sets stands out above all with more than 30 C/D. That is to say that in the citations per document the computer science topics stand out.
The analysis of the sources shows that, despite the different indexing criteria of JCR and SJR, there is variety in the categories in which they have been indexed. The first positions, according to the number of publications, are occupied by journals specialized in bibliometrics, but journals specialized in Medicine or Environment also appear among the first 20 journals. In terms of quartile ranking, a greater number of SJR journals are positioned in Q1 and Q2 compared to JCR, undoubtedly due to the different indexing criteria applied by the two databases. To complete the quartile ranking, impact factors and citation level, two metrics have been used that allow the performance of the sources based on the citations received and those expected to be received. The InCites CNCI shows that 7 of the 20 are below 1 and the SciVal FWCI shows that 9 of the 20 are also below this threshold.
In the analysis of the Medicine category alone, it has been observed that 136 countries have contributions in this field. The main countries are the United States, China and the United Kingdom. In the field of medicine, the main research areas studied were: Epidemiology, Pediatrics, Orthopedics, Cardiology, Neurosurgery, Radiology, Ophthalmology, Oncology, Plastic Surgery and Psychiatry.
With respect to Environmental Sciences category, less international dissemination has been found, with only 83 countries having worked in this field. The main ones are China, Spain and the United States. Regarding the top 10 institutions, it can be stated that only Spain and China are relevant. Spain focuses on sustainability and China on the environment. In the field of Environmental Science, the main research areas studied were: Sustainability, Sustainable Development, Climate Change, Ecology, Environmental Impact, Biodiversity, Environmental Protection, Environmental Management, Public Health and Environmental Monitoring.
The relationships between the citations of the publications have allowed, with an independent analysis, to establish clusters by key words based on the level of citation. These seven clusters were: Science Mapping, Research Productivity, Medicine, Environmental Sciences, Psychology, Nursing and Engineering. In the seven communities in which the 20 main keywords were collected, a predominance of terms related to bibliometrics applied to the different clusters was again observed. The main country keyword data has also been extracted, highlighting the relevance of China as the predominant country in four of the seven clusters analyzed. The independent analysis of the indexing category of the journals highlights that Medicine and Environmental Sciences are the most relevant areas in the field of bibliometrics, after Social Sciences and Computer Science.
In conclusion, there are many parameters that can be used to see the evolution of bibliometric studies in the period under analysis. In this case, bibliometric data and indicators have been used to study the evolution of this discipline over the years and the performance of publications. In any analysis it is important to start from the objectives of the study to be able to apply the appropriate metric values. In this sense, the recommendations established in the Leiden Manifesto and the San Francisco Declaration should not be forgotten to make proper use of the metrics that allow scientific production to be correctly assessed.