1. Introduction
Described within Suber’s seminal monograph on open access [
1], the history of open scholarly communication can be traced back almost two decades. The most succinct articulation of the open access movement was found through the 2002 Budapest Open Access Initiative [
1,
2]. The clear message delivered by this declaration led to the adoption of OA policies at an institutional and funder level, with OA becoming, in recent years, a notable area of focus for librarians, research offices, and funding bodies. Such enthusiasm has resulted in an increasingly complex policy terrain whereby institutional policies exist alongside those of various funding bodies. At the time of writing, the Registry of Open Access Repository Mandates and Policies (ROARMAP) lists 928 distinct policies by research organisations and funding bodies [
3]. Most recently, UK institutions have been required to implement the Higher Education Funding Council England’s (HEFCE) OA policy for the 2021 Research Excellence Framework (REF) [
4] (from 1 April 2018, HEFCE has been replaced by a new body named Research England). This policy requires all journal articles and published conference papers to be deposited into an institutional or disciplinary repository within three months of acceptance for publication. The policy contains a series of exceptions, which include those papers which have been made OA via the Gold route—a preference for researchers in receipt of the Research Councils UK (RCUK, from 1 April 2018 now named UK Research and Innovation) grant funding [
5]. Failure to comply may see research papers ineligible for submission to REF 2021 or researchers denied future funding opportunities.
HEFCE’s policy has positively embedded OA firmly within the minds of researchers and ensured that practical steps have been taken to make publications openly available. Some library professionals, notably Kingsley [
6], have nonetheless commented on the difficulties in providing local services to researchers within a complex UK policy terrain and the international, collaborative research environment. However, so far, there has been little analysis of institutional repository (IR) workflow interactions with existing channels for open dissemination.
Alongside recent policy developments, there has also been a growth in online services that we refer to as emergent open science services. These are typically global services or platforms, used by researchers and university staff around the world to support research processes, collaboration and dissemination. The ‘101 Innovations in Scholarly Communication’ project [
7] documented many of these and demonstrated the proliferation and application of tools and services within various research lifecycles and workflows. Publishers and other private entities have developed some of these innovations while others are publicly maintained and led by research communities and libraries. A number of services enable programmatic access to data, particularly where this concerns scholarly outputs. Typically, REST interfaces enable access to resources, maximising the potential re-use and application of data. Data from these services can be used to provide insights into OA publishing and scholarship.
Commercial products that provide data on OA works include 1findr (1science) and the freemium Lantern service (Cottage Labs) [
8]. Non-commercial products include CORE (Jisc/The Open University), the Bielefeld Academic Search Engine (Bielefeld University Library), Dissemin (CAPSH), and the Open Access Button (SPARC). These organisations are based in the US, UK, Germany, and France. OA information is of intense and growing interest to many stakeholders in the global research community.
Three such services were selected to provide data for this study: Sherpa REF, Unpaywall, and CORE. Sherpa REF resources indicate possible OA locations from serial policies. Unpaywall and CORE services aggregate information on OA content and location, for example, of outputs hosted by institutional repository systems. These services were selected because of the size and type of data holdings (for example, OA/publications/serials), confidence in data quality (for example, use in mainstream research and information system tools), and the reputation of the service or organisation provider. The data they hold also aligns with the aims and objectives of the study, described towards the end of this section.
One complexity of the modern scholarly communication landscape is that individual works may be available to be accessed from several ‘OA locations’ on the internet. Unpaywall describes an OA location as ‘the particular location where a given OA article is found. The same article is often available from multiple locations. There may be differences in format, version, and licence from one location to another, even if it is the same article in all cases’ [
9]. An instance of an OA location may be where it is hosted on a publisher platform or in an institutional or disciplinary repository.
Sherpa REF [
10] holds data on specific REF compliance information taking account of REF Panel requirements. The service also holds data about the number of possible OA locations that authors have available when publishing in a scholarly journal or conference proceeding, exclusive of REF policy requirements. Therefore, Sherpa REF data indicates the potential availability of multiple OA locations for a single publication.
Sherpa REF builds on the functionality of its sister SHERPA RoMEO service (Jisc). SHERPA RoMEO is a reputable and heavily used service by the scholarly community, holding data on self-archiving policies for individual journal titles [
11]. Sherpa REF is still in Beta, but its data model quantifies machine-readable OA permissions data. This contrasts to the SHERPA RoMEO service, which provides qualitative insights through free-text fields that must be interpreted by humans during a typical permissions assessment process.
CORE is an aggregation service that harvests full-text repository content and makes metadata and outputs available via their API [
12]. CORE’s repository connector, developed for the OpenMinTeD project [
13], means they currently hold the most extensive datasets for text mining Open Access content. CORE data include geolocation fields such as country codes and physical coordinates. In 2016, the authors of this study conducted a 10 years analysis of global OA activity relating to Brunel University, London, using CORE [
14]. At the time, this resource indicated growth in Gold and Green OA beyond the visibility of local institutional systems.
Unpaywall was released in 2017 and has been quickly adopted, in part due to the popularity of the related browser extension that provides users with seamless access to OA works, but also because the reuse of the data is not restricted. The Unpaywall service independently monitors content host locations, including Gold OA journals, Hybrid journals, institutional and disciplinary repositories. A study by Bosman [
15] applied the availability of Unpaywall data in the Web of Science database to provide a detailed breakdown of OA by type for universities in the Netherlands. Piwowar and Priem (et al.) recently conducted research using Unpaywall to reveal a global ‘state of OA’ whereby 45% of recent scholarly works were found to be legally available through Gold or Green publishing models [
16]. Additionally, a similar study conducted by Universities UK found 37% of UK publications were open access in 2016 [
17]. Piwowar and Priem (et al.) comment on the fluidity of OA definitions and different subtypes identified by researchers, and the challenges this presents in delivering acceptable data to the sector [
16]. At the highest level, Piwowar and Priem (et al.) define OA as ‘free to read online, either on the publisher website or in an OA repository’ and ‘all articles not meeting this definition were defined as Closed’ [
16]. Nonetheless, Piwowar and Priem (et al.) also define several OA subtypes to classify data from Unpaywall. These classifications correspond with the terminology used by institutions and funding bodies. These are:
Gold: Published in an open-access journal that is indexed by the DOAJ (Directory of Open Access Journals—a directory indexing open access peer-reviewed journals).
Green: Toll-access on the publisher page, but there is a free copy in an OA repository.
Hybrid: Free under an open licence in a toll-access journal.
Bronze: Free to read on the publisher page, but without a clearly identifiable licence.
Closed: All other articles, including those shared only on an Academic Social Network or in Sci-Hub [
16].
Bronze is a new classification singled out by Piwowar and Priem (et al.) to highlight the emerging trend for publishers to release ‘free-to-read’ content while retaining copyrights, which corresponds with Suber’s well-known ‘Gratis’ definition [
1]. Piwowar and Priem (et al.) describe Bronze as ‘articles made free-to-read on the publisher website, without an explicit Open licence’ [
16]. This type increases access to outputs by researchers, but paradoxically contravenes reusability benefits advocated by RCUK, HEFCE, and the Charity Open Access Fund (COAF) which drive the UK agenda. The Bronze classification holds properties of the Gold and Hybrid models, in the sense that it is a comparative form of publisher hosted OA. For example, at the time of writing 403 journal titles listed in the DOAJ publish under a ‘free-to-read’ licence. These serials specify the ‘journal licence’ as ‘Publisher’s own licence.’ Piwowar and Priem (et al.) discovered that half of ‘Bronze’ serials were ‘hosted on journals that published 100% of content as free-to-read but were not listed on the DOAJ and did not formally licence content’, which they refer to as ‘Dark Gold’ [
16].
Piwowar and Priem’s (et al.) study was acknowledged by Himmelstein (et al.) [
18], where they imply that the universal adoption of a Bronze model by publishers for ‘closed’ works may remedy the dangers to the ecosystem posed by piracy sites. This landscape presents challenges for automated tools, which can check the access ‘state’ of papers at the article level, particularly regarding Hybrid publications. For example, works might become freely accessible when subscription journals release content on their platform after an embargo period (‘Delayed OA’ according to Piwowar and Priem), or promotional access, where subscription publishers enable access to content for a limited period to increase subscriber levels—a current example of this practice is the American Dental Association, who are presently offering time-limited ‘free access’ to the
Journal of Prosthodontists [
19]. Piwowar and Priem’s (et al.) results may suggest that the monitoring of OA data should be continuous to track publisher activities and ensure perpetual access to research where reuse rights are restricted. We applaud the authors for bringing great attention to this issue and agree that longitudinal research should be undertaken on a global scale to assess these trends. The availability of the Unpaywall dataset through a REST API and other mechanisms [
20] enables this level of scrutiny without manual effort, at different levels of scale.
A recent study by Martín-Martín (et al.) [
21] utilised Google Scholar (GS) as a source of data to analyse OA levels across all countries and fields of research. Data from GS is extracted with difficulty because GS does not maintain a public API. Martín-Martín (et al.) discovered OA levels comparable with other recent studies in the field, but they also provide new insights on the open availability of research through other routes—mainly due to the academic social network ResearchGate, but also personal websites and harvesters. They discovered 23.1% available as Gold, Hybrid, Delayed, or Bronze OA, 17.6% available as Green OA and 40.6% available from other sources. The study is a further indication of the multiple host locations available to researchers when disseminating their work. This study aims to enhance Brunel University London’s publication data by using software to retrieve and process data from open science services. The data produced develops from previous studies by exploring the complexities in OA choice for authors wishing to submit papers and also investigates OA volume and multiplicity. This encompasses the following objectives:
Create OA classification schemes to interpret the data; a ‘best location’ hierarchy to measure publishing trends and whether open access dissemination has taken place, and a relational ‘all locations’ dataset to examine whether individual publications appear across multiple OA dissemination models.
Investigate possible OA locations from serial policies described in Sherpa REF resources.
Investigate OA locations from CORE resources and plot the geolocation of works.
Investigate OA locations and volume from related Unpaywall and CORE resources.
From the data and classifications, this paper explores if local institutional systems capture the full range of OA publishing activity made visible by these new services and considers whether a more vibrant picture of OA activity may be recorded. HEFCE’s policy emphasises the importance of on-acceptance deposit for manuscripts and metadata for discoverability. We, therefore, apply the data capture methodology and classifications to published and accepted research outputs in the sample to provide a comparative view of OA dissemination by Brunel researchers at the point of acceptance in addition to publication.
This research is important because UK higher education institutions (HEI) are currently seeking to accurately report on both Green and Gold OA activities by their researchers. Additionally, a comprehensive understanding of OA behaviours amongst researchers is crucial for institutions as they seek to provide tailored scholarly communication training and services to their academic staff. Localised and manual processes, which are detailed and resource intensive, are currently necessary to provide an institutional picture of OA compliance. This study explores how the adoption of emerging open science services that hold OA metadata and manifestations may streamline processes and reduce the administrative burden for academics and support staff.
The data generated highlights wide-ranging engagement with OA across a variety of models. However, it also indicates potential duplication of effort in OA workflows with single publications appearing in multiple locations. From the data, we consider whether current workflows—and particularly those surrounding HEFCE’s OA Policy—may assume such heavy focus on local IRs that they do not entirely register other methods for compliance such as deposits within disciplinary repositories, the IRs of co-authors, or Gold routes. From an institutional perspective, the intense focus on IRs is understandable as this currently provides the most robust and controllable mechanism for reporting on compliance levels at a given institution. This paper does not, therefore, criticise UK institutions for their responses to the current policy environment, but instead investigates how new open science services can be applied to increase the accuracy of institutional reporting on OA compliance, especially in helping to validate existing manual work undertaken by library staff.
2. Materials and Methods
Brunel University London manages a Symplectic Elements Current Research Information System (CRIS) to record all publication and research activity by the organisation. The system links data from authority sources including ORCID, CrossREF, Scopus, and Web of Science. Records may also be input manually, thereby increasing the local visibility of research activity for disciplines where coverage in authority sources is incomplete. For the study, the institution provided a data sample from their Elements system; the complete set of journals articles and conference proceedings from the period January 2014 to December 2017. This comprised outputs of 5570 published and 403 accepted works. There is more data available for the on-publication view as the organisation began collecting on-acceptance metadata in 2016 to coincide with the activation of HEFCE’s OA policy. Works are defined as published when they have an early-online or publication issue date in the metadata. Works are defined as accepted, when they have no publication date, but have included a date of acceptance. Data were retrieved from the open science services using attributes from the Brunel sample by a small software program customised by the authors.
The Elements dataset contains some OA data attributes used in the analysis. These are:
DOAJ; matched on ISSN
SHERPA RoMEO; identifies publisher policy ‘banding’, matched on ISSN
Records and file locations from some large repositories; Europe PMC, ArXiv
Deposits in local IR (Brunel University Research Archive)
Sherpa REF service data provides possible OA locations from journals in the sample. We investigate only possible locations that enable OA and ignore embargo, cost, or funder policy restrictions. We omit ‘submitted’ article version types from the results due to the community focus on accepted/published versions. CORE service data identifies OA locations and provides additional location information such as the names of host resources where Brunel content is held and geolocation data.
Unpaywall service data identify a ‘best’ OA location hierarchy. Unpaywall qualifies a ‘best’ OA location for an OA work by prioritising publisher-hosted content first (for example, Hybrid or Gold), and then versions closest to the version of record [
9]. A ‘best location’ can help libraries answer the typical compliance question of ‘is it Gold or Green?’ All location objects that relate to works are retrieved from the service to ascertain the total OA volume. ‘Closed’ access publications are inferred where there is no discoverable OA location from any data source.
Using a small Java SE software program [
22], attributes from the data sample (DOIs and serial ISSNs) provided the inputs to retrieve resources from the Unpaywall, CORE, and Sherpa REF REST APIs and return OA data at the article or serial level. The software manages data in the publication and open access domain. The software replicates data from adjacent systems (that is, business systems like Symplectic Elements) and uses data attributes to make HTTP requests and access the resources of OA RESTful web services. This data is persisted in a MySQL relational database using the Hibernate framework [
23]. Entities and relationships are described in the data model in
Figure 1.
All software [
24] and related data outputs [
25] produced during the study are published online and described in the
Supplementary Data Section. Data analyses are the product of SQL relational queries on the MySQL database. For example, one SQL query produced an enhanced OA publication dataset by joining the Elements, CORE, and Unpaywall entities together in a relational query [
26]. Logical conditions test attribute values and this identifies the type of OA according to the classifications defined in the methodology outlined below. As mentioned in the Introduction, the study uses two classification schemes; ‘best location’, which highlights publishing trends and whether OA dissemination has taken place, and ‘all locations’, indicating the multiplicity of works across OA dissemination models.
The ‘best location’ classification extends Unpaywall’s definitions but with some fundamental differences between the Gold, Green, and Closed access subtypes (see
Table 1). The types enable us to measure publishing trends and open access dissemination.
Like Unpaywall, we monitored the OA publisher content first (Gold OA and Hybrid), followed by external disciplinary and institutional repository locations and, finally, Brunel’s institutional repository.
Unpaywall data does not explicitly distinguish Bronze OA from Gold and Hybrid classifications in its API. Rather, it attributes it in its data model (including host location descriptors, publication licences, and Gold journal indicators) and enables these types to be derived. For example, the service matches Gold locations by ISSN entries in the DOAJ. The API documentation says that this indicator for Gold OA journals is ‘useful for most definitions of Gold OA. Currently this is based entirely on the inclusion in the DOAJ but eventually may use additional ways of identifying all-OA journals’ [
9]. In our method, we subsume Piwowar and Priem’s (et al.) Bronze classification within the Gold and Hybrid types because Bronze characteristics are prevalent within both publishing models, that is, documents that are free-to-read from the publisher, but do not have an explicit OA licence. As discussed in the introduction, we do not wish to conflate Bronze with Gold or Hybrid OA. Therefore, in this study, we examine the licence data retrieved by the Unpaywall service to inform our discussions on any Bronze OA characteristics displayed by publications in the sample.
SHERPA RoMEO data, sourced within Elements, use a Yellow and Green banding scheme for serial titles, which is used to indicate whether an archival option for peer-reviewed content is available. Otherwise, we consider no compliant archival option as available. In this way, RoMEO data distinguishes closed access works between ‘publisher’ and ‘author,’ that is, where a journal policy prohibits archiving and where an author has apparently failed to take up a possible Green OA location. This distinction can help libraries target researchers to achieve OA where this is possible and highlight publisher stakeholders who are not engaged in the movement. The ‘best location’ classification is derived from the dataset by comparing the values of OA attributes from the various data sources.
Supplementary Data [
27] describes the tests performed against the attributes to derive the classification.
Unpaywall data describes unique publisher host locations and the multiplicity of repository hosts as location objects. The total number of location objects enable an OA ‘count’ of published works. To distinguish disciplinary repositories and IR locations, we excluded outputs found by Unpaywall that have an oai:bura identifier (of Brunel’s local IR—the Brunel University Research Archive). We instead use internal records via Elements to measure engagement with Brunel’s IR. This is due to embargoed works not being returned by Unpaywall and because the aim of this study is to maximise the identification of OA locations.
By establishing a comprehensive, relational dataset on all OA publishing activity discovered by the services, we can describe an ‘all location’ classification to explore the possible effects of UK’s Gold and Green policies (see
Table 2):
From this data and the classification schema, we may infer whether duplication is evident in OA dissemination for Brunel University London. UK funder policies may encourage this outcome because in combination they preference both publisher (RCUK) and IR (HEFCE) OA locations and this effect may be exacerbated by research collaborations across institutions and territories. The variance in publisher archival policies may also increase the likelihood of multiple OA locations for a single work.
4. Discussion
This study aimed to enhance Brunel University London’s publication records using data from open science services. The results show high levels of OA for the institution. The results also reveal OA activity external to institutional systems. This study resultantly contributes to two broad discussions for UK university staff involved in OA compliance reporting and service delivery.
Firstly, dialogue may focus on localised systems and whether they can be wholly relied upon to provide an accurate picture of OA behaviours amongst academic staff. Secondly, this study’s ability to use emergent open science services to enhance institutional data suggests that there may be the opportunity for greater interoperability between emerging services such as Unpaywall, CORE and Sherpa REF and established providers of IRs and research information systems.
An objective of the study was to investigate possible OA locations from serial policies described in related Sherpa REF resources. This provided insight into the complexities of OA choice prior to OA dissemination. The data in
Section 3.1 demonstrates that authors have multiple OA options available to them, which can be attributed to the success of the OA movement in producing a range of platforms for the open dissemination of research. The variety of dissemination channels available may increase the likelihood of duplication, with multiple versions of any given paper potentially existing on a mixture of disciplinary repositories, IRs, social platforms and publisher websites. Some publications in the sample offer as many as 13 possible OA locations for peer-reviewed works.
This study also generated a ‘best location’ hierarchy to measure publishing trends and to determine whether open access dissemination had taken place, and a relational ‘all locations’ dataset to examine whether individual publications appear across multiple OA dissemination models. As set out in the introduction, there are challenges when classifying OA works, particularly in light of Piwowar and Priem’s (et al.) Bronze OA definition. In
Section 3.3.1 we used licence data to investigate reuse trends across Gold and Hybrid types. The results may indicate a positive, cultural impact of RCUK policy as CC-BY is the dominant licence selected for Gold and Hybrid works.
However, as with Piwowar and Priem’s (et al.) discoveries, we found that 28% of Gold or Hybrid works in the sample display Bronze OA characteristics. Access by authors to other funding sources and the collaborative research environment may leave little recourse for intervention by institutional services. It may also be the case that in some circumstances researchers may not be granted any form of open licence by their journal. Such reuse limitations may prevent replication of the final published version in repository hosts. However, the evidence provided in this study suggests this may not be the case as we found that almost all Gold and Hybrid works in our sample are available in other OA locations, irrespective of Bronze publication indicators. It may be that, in practice, reuse is not prevented by rightsholders. The reasoning behind Gold OA publishers retaining copyrights to works is not fully understood by the sector and requires further study.
Although providing the greatest levels of access, the reusability of repository content is much less clear from the data.
Section 3.3.1 shows that 71% of repository hosts discovered by Unpaywall do not return any licence information and that non-commercial licences were the most common of the remaining set. HEFCE’s policy advises a CC-BY-NC-ND license to satisfy minimum reuse requirements [
4]. Concerning repository hosts, RCUK’s policy does not require a specific licence for green OA, only that there are no publisher restrictions on non-commercial re-use. This can be met by a Creative Commons Attribution Non-commercial (CC-BY-NC) licence. Further research may be useful to explore the effects of stricter restrictions on the use of repository contents. Such investigations may inform future policy direction.
Described in the methodology, the ‘best location’ classification is inclusive of closed OA works, distinguishing author and publisher. This is an important consideration often overlooked by OA research. In
Section 3.3.1, the ‘Closed (Green option)’ category indicates the ‘gap’ to 100% legal Green OA, accounting for just 7.7% of Brunel publications in 2016. By targeting support towards the authors of these works, institutional services may have the greatest opportunity to improve compliance figures and boost the OA agenda.
In contemplating how we may drive greater levels of access for the ‘Closed (publisher)’ category (that is, where no legal Green archival option was found to exist), we must also consider the lack of parity in access to Gold and Hybrid funding by researchers that is exacerbated by rising APC and subscription costs [
17]. Additionally, there is a specific challenge presented by small society publications, who may argue that they require subscription models to sustain their activities. Initiatives like SCOAP3, the Open Library of the Humanities, and offsetting deals like the ‘Springer Consortium agreement’ may begin to redress this imbalance by shifting costs from authors to institutions. However, financial difficulties may continue to prevent the transformation of the remaining literature to ‘open’. Advocacy activities by libraries may help authors to consider these issues when selecting their publisher.
The ‘all location’ classification examined OA location multiplicity and possibly highlights the impact of mandatory policies at an institutional level.
Section 3.3.2 found that 40% of OA publications have taken Gold and Green routes. Whilst this duplication is not problematic
per se, it does raise questions about the extent to which local institutional workflows fully capture the diversity of OA activity. The data in
Section 3.2 and
Section 3.3 show that aggregation services may help to increase institutional compliance figures by increasing the visibility of OA dissemination. Indeed, emerging open science services may run concurrent to localised workflows necessarily imposed by UK institutions to ensure compliance. CORE data shows 131 hosts that have Brunel research holdings [
29].
Section 3.3.2 found Gold works to have the highest number of OA locations, on average, followed by Hybrid. This may be due to publisher-driven archival policies. Gold models increase the likelihood that the journal will share content directly with disciplinary repositories. For example, Biomed Central mirrors all content into the PubMed platform [
34]. However, significant replication of Gold and Hybrid publications may also indicate that the policy terrain is driving a duplication of effort in access workflows. Further study may explore whether the most affected are funded authors in the UK, who have specific mandatory requirements. We found no evidence of duplication within exclusively Green works. This indicates that HEFCE’s policy may be successfully driving OA access to works that might otherwise be closed.
Presently, certain aspects of policy and the effectiveness of internal procedures are only measurable by engagement with local IRs/research information systems. This is partly because certain administrative data are not defined within global bibliographic and metadata standards. For example, the ‘date of deposit’ is not part of the RIOXX standard, and has only become important recently for UK institutions due to HEFCE’s policy and audit requirements. At the time of writing, there are no indicators that this data will be adopted within global metadata standards, as it regards system administration and has yet to become a mainstream descriptor of a scholarly work. New, existing, and growing OA cultures in disciplinary repositories and other researcher-led communities may now be disregarded by UK institutions because only internal workflows that engage with local IRs/research information systems can capture the necessary administrative data to mitigate the perceived risk to the institution.
Section 3.3.1 highlights the lack of OA information available for accepted works. This lack of information may drive a sense of risk within institutions and possibly inform manual and localised workflows and strategies. Tools like Unpaywall currently only accept DOI identifiers that are maintained by DOI providers and these correspond to published works. This particular finding may, therefore, strengthen calls for publisher engagement with the Jisc Publications Router, a service to transfer on-acceptance metadata and manuscripts between publishers and research systems. However, at the time of writing, just five publishers are listed on the website as content providers and three of these are Gold OA publishers [
35].
A further challenge facing institutions is that authors frequently change employers and import publications and records into local systems as they change jobs. Further studies could look at the impact of author mobility on OA compliance, and will probably add further weight to a need for global OA solutions. CORE data [
29,
30] show the majority of external host OA locations are spread across external IR systems in the UK/EU such as ‘Spiral’ (Imperial College London) and King’s College London’s Research Portal.
Section 3.2 and
Section 3.3 describe a spot check comparison with Google Scholar. This indicated many more OA locations available than shown within the CORE and Unpaywall data. Martín-Martín’s (et al.) research [
21] has highlighted the extent to which these OA locations are funder assured, such as disciplinary and institutional repositories, and not social networking repositories, such as ResearchGate, or uploads to project websites. Further study may explore the extent to which ‘other’ OA locations benefit the discovery of research. Large-scale and reproducible data in this area may also inform UK and other funder policies, such as ongoing reviews by Research England and the Wellcome Trust [
36,
37].
At the broadest level, this paper has highlighted a tension between the reality of research as a highly collaborative enterprise, with partnerships that cross institutional and territorial boundaries, and the requirements of nationally focused policies. This makes monitoring OA activity for compliance difficult as institutions tend to rely on local IRs and research information systems to report back to national funders. Approaches by institutions—in the UK at least—are therefore understandably driven by reactions to RCUK (now named UKRI) and HEFCE (now named Research England), with the former displaying a preference for Gold OA and the latter favouring repository usage. This study acknowledges the significance of these policies, which have been crucial in embedding OA within the minds of authors. However, the OA landscape comprises of a vast network of repositories, social systems, and open initiatives by new and longstanding publishers, which is not easily captured within local systems. The results of this paper, therefore, point to the need for interoperability between local systems and emergent open science services which are attempting to aggregate OA activity. Such two-way communication has the potential to enhance data for compliance reporting, limit the possibility of manual duplication, and free up library staff to target effort on those who have not yet engaged with OA.