Skip to Content
ComputersComputers
  • Article
  • Open Access

11 August 2022

Transforming Points of Single Contact Data into Linked Data

and
1
General Secretariat of Information Systems, Ministry of Digital Governance, Chandri 1, Moschato, 18345 Athens, Greece
2
School of Computer Science and Informatics, De Montfort University, Leicester LE1 9BH, UK
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.

Abstract

Open data portals contain valuable information for citizens and business. However, searching for information can prove to be tiresome even in portals tackling domains similar information. A typical case is the information residing in the European Commission’s portals supported by Member States aiming to facilitate service provision activities for EU citizens and businesses. The current work followed the FAIR principles (Findability, Accessibility, Interoperability, and Reuse of digital assets) as well as the GO-FAIR principles and tried to transform raw data into fair data. The innovative part of this work is the mapping of information residing in various governmental portals (Points of Single Contacts) by transforming information appearing in them in RDF format (i.e., as Linked data), in order to make them easily accessible, exchangeable, interoperable and publishable as linked open data. Mapping was performed using the semantic model of a single portal, i.e., the enriched Greek e-GIF ontology and by retrieving and analyzing raw, i.e., non-FAIR data, by defining the semantic model and by making data linkable. The Data mapping process proved to require a significant manual effort and revealed that data value remains unexplored due to poor data representation. It also highlighted the need for appropriately designing and implementing horizontal actions addressing an important number of recipients in an interoperable way.

1. Introduction

EU citizens and businesses, especially those operating in another EU country, often struggle to understand the rules that apply to their particular case or the steps required to carry out simple procedures. Searching for information is often a tiresome and confusing process. Results tend to be scattered across different websites that often lack any guarantee of quality or reliability, and significant information gaps remain in many areas, leaving important questions unanswered. A number of procedures are still exclusively paper-based or require a person’s physical presence, which can be costly and time consuming. Cross-border users often encounter obstacles with national administrative procedures because they only work with national phone numbers, postal codes, or payment methods. Additionally, many citizens and companies are unaware of available assistance services to help them solve their problems. All these obstacles limit the consolidation of a genuine single market where the freedom of goods, services, capital, and people is fully ensured. It also hampers the establishment of a digital single market by building unnecessary online barriers between people in different EU countries. Moreover, existing information is usually published in a raw format, i.e., without following specific guidelines or being modeled in a disparate way, even though it serves a common purpose. In addition to this, it does not contain encoding that allows minimum data linkage such as JSON-LD thus, which remains unexploited. A characteristic example is the information residing in governmental portals for Directive 123/2006/EC (Directive 2006/123/EC) purposes. This situation hampers information exchange, which can be feasible using linked data technologies. Linked data technologies aim at transforming data published in web sites into a machine- readable format (usually RDF using URIs) in order for them to be linked to other external datasets.
More specifically, Directive 123/2006/EC launched in 2006 focuses on simplifying the procedure of practicing a profession by a European citizen in another member state. Each national portal must contain information regarding the required supporting documents for each service activity (for example Operation License to Tour Guide’s) in two languages, the official language of the member state and English. However, no linkage exists between relevant information inside each website, let alone linkage between websites supported by each member state. In other words, no linkage exists between “Cross border provision of services for tourism businesses” or “ Tourist offices’ notification of commencement of business” appearing in the corresponding Greek portal and “Tour operator and travel agency services” or “Tour guide services” appearing in the Hungarian one, in the case where a European citizen searches for related information.
The goal of our research was to examine how information residing in various Point of Single Contacts (PSCs) can be exchanged using RDF representation in order to be linked. This has as a prerequisite the transformation of this information into RDF triples. The transformation of this information is based on the data (or ontological) model of each Point of Single Contact (PSC), if and only if it exists. Information mapping thus results in mapping data models. In the case where, as the one examined in this paper, no data model exists for some Point of Single Contacts (PSCs), then existing model(s) are used to perform this information mapping. The current work used an already existing tool presented in a previous work [1], which transformed data residing in the Greek PSC portal (http://www.eu-go.gr/sdportal/ (accessed on 10 February 2022)) into the RDF format using as a basis the Enriched Greek e-GIF ontology [2]. The Enriched Greek E-GIF ontology is a two-layer ontology, which aims to capture and link all knowledge elements that are essential to describe service activities provided to citizens or businesses.
It should be stressed that information residing in the Greek PSC portal is modeled based on the Enriched Greek E-GIF ontology. Since we made the assumption to use as a base model for our comparison the enriched Greek E-gif ontology, we decided to expand the work performed in [1] in comparing information residing in Greek PSC with the one residing in other PSCs, where information on which data model is used is missing.
The current work aimed to reveal the difficulty of exchanging information residing in portals related to actions originated from the European Commission, aiming thus at ameliorating citizen’s everyday life. An organization can use the richness of the RDF model to capture the detailed relationships in their data and share them in multiple ways. This can only be accomplished if the information is semantically modeled in the same way (i.e., following a unique semantic data model which is designed in advance) and is transformed into RDF or JSON-LD format. The benefit to the e-government will be the possibility to provide cross border electronic services to citizens and businesses and to exchange data between member states. The importance of linking governmental data becomes more crucial in view of the forthcoming Single Digital Gateway Regulation, which exploits and reuses—among others—information residing in Point of Single Contacts (PSCs). More specifically, Single Digital Regulation states that “A number of Union acts have aimed to provide solutions by creating sectorial one-stop shops, including points of single contact established by Directive 2006/123/EC of the European Parliament and of the Council, which offer online information, assistance services and access to procedures relevant for the provision of services”.
As the ultimate goal of the Service Directive was to enable the cross boarder exertion of core public services, this goal can only be achieved if information residing into various PSCs is structured in a uniform manner. This will facilitate the exchange of public services descriptions between administrations across borders and the improvement of the searchability and accessibility of public service information for citizens and businesses. This paper used with a real case problem and tried to concisely highlight this lack of a unified way of structuring information which not only has an impact on readers but also limits information from being machine exchangeable. The value of the automated phase, having as a prerequisite the existence of a unified data model (i.e., the existence of a semantic model enabling semantic interoperability) is the discoverability and the exchange of information between systems, i.e., interoperability. From the citizen’s point of view, it would be feasible to perform a query. For example, for the exertion of the tourist guide profession. This query would be performed automatically to all PSCs and results would be retrieved from various countries. At the current moment, the user visits each PSC portal individually and performs the same query.
More specifically, our work attempted to delineate the steps required to transform information residing in five national portals containing information about Directive 123/2006/EC into linked data. The steps performed revealed that, not only does such an approach require a significant effort proving to be a tedious task, but it does not reach the goal due to a lack of uniform semantic representation (a well-adopted semantic model designed in advance) of the information residing in portals.
The structure of the paper is as follows. Section 2 provides an overview of related methods. Section 3 presents the steps performed for the creation of RDF triples in the pages originated from the Greek, Hungarian, Maltese, Slovenian and Cypriot Point of Single Contact portals. Section 4 provides a description of our experiments. Section 5 provides conclusions and future steps.

3. Methodology

As mentioned previously, the current work extended the work performed in [1], where a limited number of pages residing in the Greek PSC portal were transformed into a RDF format. Additionally, web page transformation was restricted to the ones residing in it, without taking into consideration how such a transformation can take place in pages residing in other PSC’s having as a basis the Enriched Greek E-gif ontology. The current paper, through data mapping between information residing in five PSCs, attempted to prove that this type of mapping is a tedious task. Already existing work focused on transforming information residing in the Greek PSC into RDF triples using the Enriched Greek E-gif ontology. The current work attempted to map information residing in the Hungarian, Maltese, Slovenian and Cypriot PSCs to the Enriched Greek E-gif as well as to create RDF triples, aiming at connecting related information. An alternative methodology that could have been used is the one followed by the STIR data project. However, it was not followed because the current work was an extension of a previous study. Moreover, our input was presented in an html format compared to CSV, XML or JSON-LD, which would require additional transformations.
It must be stressed that CPSV-AP ontology does not include information such as documents, prerequisites as well required steps in order to exert a service provision activity, i.e., profession, appearing in a number of Points of Single Contacts (PSCs).
Even though CPSV-AP ontology could have been designed in order to depict information found in various PSCs, thereby retrieving information contained in them in a more effective way, this was not the case. The aforementioned situation was exacerbated by the fact that information on how data is structured in various PSCs is agnostic due to the lack of a publicly available ontologies describing each of them. This led us to use the Enriched Greek E-gif ontology alone as a base to compare how information is structured in various PSCs.
The methodology followed here is in line with the GO-FAIR principles. GO-FAIR (https://www.go-fair.org/ (accessed on 10 February 2022)) defined a seven-step FAIRification process focusing on data, but also indicating the required work for metadata alignment. The FAIRification process was conceived as a set of step-by-step operations that should be performed over data and related metadata to achieve its FAIRness. By FAIRness we mean the realization of FAIR principles, i.e., Findability, Accessibility, Interoperability, and Reuse of digital assets. According to GO-FAIR, the steps involved in this process are as follows: (1) retrieve non-FAIR data, (2) analyze the retrieved data, (3) define the semantic model, (4) make data linkable, (5) assign license, (6) define metadata for the dataset, and (7) deploy/publish FAIR data resource.
Our work focused on the first five steps of the GO-Fair principles. More specifically, in alignment with [13] we placed emphasis to the following steps: (1) Raw Data Analysis, i.e., inspection of the content of the data to find out which concepts are represented, the structure within and among the data element concepts, and the storage and serialization format of the data elements. (2) Data Curation and Validation, i.e., categorization and extraction of data fields, types, and values of metadata. The curated data should be validated against known quantitative relationships and expected values and should conform to the semantic model defined for the FAIRification workflow, i.e., the EGIF ontology through a set of structural rules. Moreover, the data itself should conform to the semantic rules exposed by the data element or attribute itself. (3) Semantic Modeling, i.e., the definition of the “semantic model” for the dataset, which describes the meaning of entities and relations in the dataset accurately, unambiguously, and in a computer actionable way (the e-GIF Ontology in our case). Data curation should map the raw data conforming to such a standard-based data model. (4) Making Data Linkable, i.e., the transformation of raw data into linkable data by applying the semantic model defined in the previous step. This step promotes interoperability and reuse, facilitating the integration of the data with other types of data and systems. This information might be hosted in a dedicated portal and can act as the starting point for citizens and businesses aiming to retrieve information.
A similar methodology was followed in [14], where the authors proposed a semantic metadata mapping procedure (SMMP) in the Digital Libraries context, based on ISO/IEC 11179, which can maximize the interoperability among data elements. The procedure consists of the following three main processes: identifying metadata element sets, grouping data elements, and semantic mapping. Having an already existing semantic model, the remaining steps are critical since they manage data type mapping which involves manual effort. The task requires the correct attribution and mapping of an html tag with the most semantically correct class property, a typical problem when performing ontology data schema mapping. This task can either be tackled as an information extraction problem, i.e., named entity annotation which requires human annotation or as a machine learning classification problem which also requires human effort in order to create the appropriate training corpus. In all cases, the correct attribution of html or json-ld tag is also dependent on the structure of the raw data (in our case the html pages) as well as the quality of data residing in them. The task implies syntactic as well as semantic analysis. Manual effort was also required in [15] where the authors placed emphasis on human intervention when assigning terms to correct DDC metadata classes. The authors followed a machine learning classification approach. As can be seen, studies in the area of semantic interoperability are being actively undertaken, especially through large-scale initiatives and projects aimed at achieving automatic semantic mapping. However, as [16,17] emphasized, and as pointed out by [18], it is difficult for machines to achieve precise semantic mapping owing to the problems of disambiguating polysemous and synonymous words and senses. Without extensive human-mediated efforts that target the identification of incorrect semantic mapping, the goal of enhancing and refining semantic interoperability, even in relatively less complex information environments, will be thwarted. The following subsections explain the steps performed in detail.

3.1. Dataset

During the current work, the intention was to extend work performed in [1] to apply the toolchain to more web pages appearing in the Greek PSC, and also to expand RDF triples produced using additional Greek e-gif ontology’s properties. This is the reason why, as a first step, all pages appearing in the Greek PSC were downloaded. This resulted in 411 unique files containing descriptions of public service activities. Among them, 325 were written in Greek and 86 in English. In order to perform comparisons with information residing in other PSCs we focused on the 86 web pages written in English.
As a second step, we expanded the set of attributes extracted from each web page to form RDF triples and consequently those used for performing query search in the produced RDF triples. More specifically, during our current experiments we focused on the following properties for every service provision activity (i.e., profession): title, provision method (i.e., establishment or cross border), NACE code classification, required time, cost, responsible public body, legal framework, required documents as well as comments. Each of the aforementioned properties is related to an equivalent relation, i.e., predicate such as hasTitle, hasMethod, etc., leading thus to a number of RDF triples for every web page.
A third step was to expand our experiments to other Point of Single Contacts (PSCs). Since we made the assumption to have as a reference of comparison between PSCs belonging to different countries the Enriched Greek E-gif ontology, the purpose was to try to extract as much valuable information as possible, i.e., as many of attribute instances available.
In order to examine whether the Enriched Greek E-gif ontology can capture information residing in other PSCs, we examined all PSCs. It is commonly agreed that there is a vast specificity and diversity of electronic services of European governments. This leads to a significant heterogeneity and non uniformity on how information is structured. Our main prerequisite was the provision of information in another language apart from the country’s native language, i.e., provision of information found in English.
A second criterion was whether each PSC under consideration contained a similar professions with the ones described in the Greek PSC. The idea behind this was to limit the scope to websites that falls into limited areas or domains. In order for the comparison to make sense, information needs to be downloadable (i.e., in html format) as well as structured in a similar manner. Moreover, content should be sufficient in terms of attributes in order to make the comparison valid. Thus, a third criterion was how information was organized within a web page, i.e., whether it contained information such as cost, time, legal framework, prerequisite documents, comments and public body. Another criterion taken under consideration was the actual html code of each web page, to perform effective html parsing.
Among all PSCs, we restricted our search to the ones of Malta, Slovenia, Hungary and Cyprus. However, the Hungarian PSC fulfilled most of the criteria listed above. More specifically, in the Hungarian PSC, information is organized in a similar manner, i.e., it provides information regarding the title, administrative cost, processing time limit, list of required documents and legal framework. On the other hand, it does not contain useful information such as the NACE code. A significant difference between the Hungarian and the Greek PSC is the fact that the first contains information regarding establishment as well as cross border provision into a single web page, while in the second the information resides in two distinct web pages since the legal framework (thus the prerequisite documents) for them might be different. Another point to note is that in the Hungarian PSC there is not a unique web page layout, i.e., all pages do not present information in a unified way. This makes web page parsing more difficult. As a consequence, we extracted 15 out of 34 web pages containing information represented in a unified manner and describing professions that are similar to the ones found in the Greek PSC. Properties that appear in the fifteen pages of the Hungarian PSC that are in alignment with the Enriched Greek E-Gif ontology are as follows: ServiceCost, LegalFramework, ServiceComment, Input (which represents prerequisite documents), Title and DeliveryTime.
However, we expanded our experiments in order to explore and compare information from Slovenia, Cyprus and Malta. More specifically, we downloaded 350 pages from the Slovenian PSC (https://spot.gov.si/en/ (accessed on 10 February 2022)), 120 from the Cypriot PSC (https://www.businessincyprus.gov.cy/ (accessed on 10 February 2022)) and 143 from the Maltese PSC (https://businessfirst.com.mt/ (accessed on 10 February 2022)) for the same professions coexisting in the Greek and the Hungarian PSC. Information in the Slovenian PSC is organized following a specific structure, namely (a) title, (b) description, (c) conditions, and (d) procedures—containing a downloadable application form as well as the prerequisite attachments, i.e., documents or attestations—(e) competent authority, (f) activities, (g) legal basis, (h) license register and (i) cost. It is worth mentioning that information regarding a specific service provision and for some of the aforementioned attributes might be missing, as it was not provided. Even though we noticed a similarity compared to the Greek PSC in terms of attributes such as cost, Legal basis, etc., matches could only been achieved by using keywords originating from the title as well as the hasDocument appearing as an Attachments attribute. The hasServiceComment was mapped to the original description provided for each service provision since no dedicated attribute providing comments exists.
The Cypriot web portal organizes information in the following manner: (a) general information; (b) application submission subdivided into (b1) who is eligible and (b2) where to apply (b3) which certificates must be submitted (b4) fees applicable & how to pay; (c) decision notification; (d) licence validity period; (e) dispute with the competent authority’s decision subdivided into (e1) How to file an administrative decision and (e2) how to appeal; and (f) legislation & Obligations subdivided into (f1) which laws and regulations Apply and (f2) what are my obligations. As it can be easily seen, the way that information is structured differs from the way it was modeled using the eGIF ontology. The has Service Comment attribute was mapped to the General information part provided for each service provision.
Malta’s PSCs present some peculiarities. More specifically, it usually has a nested structure. Normally, a web page contains the following fields: (a) title (b) description (c) application form (d) applicable Law (e) documents needed with application (f) contact information. The description field was mapped in the hasServiceComment attribute, the applicable Law was mapped to hasLegalFramework while the “Documents needed with application” attribute was mapped to hasDocuments. However, it might be the case that, in the same web page, links related to one or more specific services can appear meaning that a hierarchy of services not visible to readers might exist. An example of this is “Bunkering Authorisations” (https://businessfirst.com.mt/en/starting/Pages/Bunkering.aspx (accessed on 10 February 2022)) which contains the description of two other activities, i.e., “Authorisation to Operate a Barge or a Marine Terminal/Facility” and “Authorisation to Operate a Marine Fuel Station through Dispensers Operated in the Proximate Vicinity of the Shoreline and Operating Exclusively for Ships” as well as one complementary link “Info and documents for REWS Approved Competent Persons”. Each of those sub activities redirects a user to a dedicated page which has a different structure of representing information, i.e., 1: Application for an Authorisation (Activity Name) 2: Legal References 3: Means of Authentication, Identification and Signature 4: Type of Evidence to be Submitted 5: Applicable Fees 6: Penalties and Offences 7: Deadlines or Indicative Time that the Regulator Requires to Complete the Procedure 8: Rules on Lack of Reply from the Regulator and Legal Consequences 9: Means of Redress or Appeal and 10: Competent Authority. Due to the aforementioned complexity, we made the decision to extract content appearing in the first level. Additionally, in the source code of each web page one can detect a type of services classification perhaps similar to the one of the NACE codes used in the Greek PSC. An example of this is “Child-Care-Facility” or “School Licences”.

3.2. Preprocessing Steps

As previously mentioned, we used the toolchain implemented in [1]. This toolchain involves the following steps
Step 1. HTML web page parsing—Web page retrieval: Even though information hosted in the Greek PSC portal is publicly available, no interventions, i.e., addition of RDF tags and annotations are permitted, which means that we downloaded a copy for each page. For each web page of the dataset in question, a pre-processing step, i.e., html parsing, was performed using the JSOUP library (https://jsoup.org/ (accessed on 10 February 2022)) for extracting properties of interest, i.e., the content of specific fields and to store the extracted values. JSOUP is an open source Java library used mainly for extracting data from HTML using DOM traversal or CSS selectors. It also allows for the manipulation of HTML elements, attributes, text and output tidy HTML.
Step 2. Creation of the RDF model from data produced by the previous step: After downloading and parsing web pages belonging to the dataset in question, the creation of the RDF model was performed using the Enriched Greek E-Gif ontology. More specifically, taking as input the aforementioned ontology, we created a model. As a result, class names were produced, i.e., a namespace which consists of a prerequisite for searching every class. Predicates are produced by invoking appropriate methods which take as argument every property of interest. In the RDF model that was created, a node corresponds to a class appearing in the Enriched Greek E-Gif ontology. More specifically, each service provision activity, i.e., the profession, is represented as a node (a subject in the RDF model) and is connected either with other nodes or strings through appropriate predicates such as edges forming RDF triples. Each edge corresponds to an ontology property.
For the creation of the RDF model, the Apache JENA Framework was used. Apache JENA (https://jena.apache.org/ (accessed on 10 February 2022)) is recognized by the W3C as a Semantic Web Standard for creating semantic applications. Apache JENA provides an API to extract data from and write to RDF graphs. The graphs are represented as an abstract “model”. A model can be sourced with data from files, databases, URLs or a combination of these. A model can also be queried through SPARQL. Apache JENA provides a mapping correlation of names between those appearing in an already defined ontology (in our case the Enriched Greek E-GIF ontology) with those appearing in a RDF model generated by data samples.
Step 3. Creation of RDF Triples: A subsequent step deals with the creation of the required triples using the information we derived through the application of the parser as well as class and property names derived from the ontology (steps 1 and 2). The Apache RDF API uses the notion of the namespace, which ensures the uniqueness of each entity of the model as well as the model itself in the semantic web. Apache JENA requires the use of a namespace prefix in the model’s name, as well as in each entity contained in the model (i.e., subject, predicate, and object). Once the RDF model (as well as the equivalent namespace NS) was created, RFD triples corresponding to web pages of the dataset in question were created. Apache JENA RDF API was used in order to produce the triples based on information exported by the parser when it was applied to the dataset. Nodes were created resulting in the construction of triples through the interconnection of nodes. All triples were created and added to the model in a similar manner.
Step 4. Storage of the model into a Triple Database Store: A subsequent step was model storage, accomplished using the Triple Database Store API. An RDF model is stored via TDB into a recognizable format by Apache JENA API but also by dedicated servers such as FUSEKI. The Triple Database API provides a logical unit storage of the model through which the information is retrieved using SPARQL. It uses a simple and quick way of encoding nodes of RDF relations.
Step 5. Retrieval of semantic information: As long as the model is stored in an RDF format, retrieval of desired information can take place via SPARQL queries using the Apache JENA API. Queries can be expressed using the java language by invoking specific methods provided by the JENA API. Apache JENA supports its own API that supports the SPARQL RDF Query language (SPARQL Query and RDF Query Language). SPARQL is an information retrieval language from a database structured on the RDF model. More specifically, SPARQL seeks information that meets specific criteria, but provides a description of the type of ternary relations to which desired data belong to. For SPARQL, namespace definition is necessary. The FUSEKI server serves requests in the form of SPARQL queries. FUSEKI is a SPARQL server that provides HTTP endpoints to RDF data. Table 1 provides some statistics regarding the number of pages, the number of nodes as well as the number of triples retrieved after applying the toolchain in pages (written in English) belonging to the five PSCs under examination.
Table 1. Statistics on the number of pages, nodes and RDF triples extracted from the five PSCs.
A graphical representation of the methodology followed appears in Figure 1.
Figure 1. Methodology Schema.
In order to perform the initial steps, manual source code analysis of the web pages took place. It must be stressed that all web pages examined lack useful json-ld tags that could have been beneficial for performing the semantic mapping. The side effect on this was that we had to explore the structure of each web portal in order to create a dedicated parser for each of them. Moreover, it limited our options in using other already existing tools such as XSLT (https://www.w3schools.com/xml/xsl_intro.asp (accessed on 10 February 2022)). In order to achieve semantic mapping, a hand-crafted effort was required to perform one to one mapping from html source code to the semantically equivalent e-GIF ontology properties. This was performed without the use of any mapping instructions, i.e., it required a user attribution of the semantic conceptual equivalence of an html tag to the most appropriate e-GIF ontology property, which is a typical problem in ontology-data scheme mapping. The proposed methodology can be re-used since it was based on open source code. Required adaptations and modifications depend on the input format and quality. The use of alternative ontology requires bounded adaptations.

4. Experiments

The purpose of our experiments was to examine the adequacy of Enriched Greek e-Gif ontology in representing knowledge residing to other PSCs. This can be explored by performing queries in both datasets, i.e., in the rdf triples produced after applying the toolchain presented in the previous section in web pages belonging to all PSCs mentioned in Table 1.
Manual comparison of the fifteen service provision activities selected from the Hungarian PSC to the 86 corresponding ones appearing in the Greek PSC based on their title, revealed that only 7 of them are common in both PSCs. Those common activities were used as a basis for extracting related pages to the rest of PSCs under examination.
Practically, two distinct datasets were created, with one for pages residing in every PSC. The aim was to run queries on both datasets based on keywords for the triples created (based on predicates). For comparison reasons, we performed the same queries as the one executed in both datasets in the actual PSCs web pages using the search option provided by each of them. Most specifically, in those queries, we used as keywords words appearing in each service provision activity’s title, comment and document (i.e., input). However, we are not aware of how the search engine in each PSC works, i.e., on which part(s) of the information every profession is based on. The aforementioned search was selected since no other comparison could be performed. The aim here was to try to check the validity of the automated method. Unfortunately, since data residing in the PSCs were not previously used, there was no point of providing a reference for comparison, which led us to the solution of the manual search.
The rationale behind this was to try to simulate how an end-user behaves. This is the reason why text search was introduced. The result of those queries is strongly related to the produced rdf triples, which in turn is related to the level of data mapping accomplished. This, by no means, leverages the typical strengths of having the information as Linked data, which is exactly what current research tries to highlight. For our queries, we used as a basis (starting point) the information residing in the fifteen pages of the Hungarian PSC site. More specifically, our queries were mainly based on the predicates hasServiceComment, hasTitle and hasInput. We restricted our search on those predicates since we believe that they contain the most valuable information compared to hasServiceCost, hasLegalFramework or hasDeliveryTime. The search was performed by extracting potential keywords from each of the aforementioned predicates of each of the 15 service activities of the Hungarian PSC and used them to query the 86 web pages of the Greek PSC (written in English). When performing our queries we observed that case sensitivity produced different results. Case sensitivity has a significant impact on the number of obtained results. Queries performed using keywords appearing in the hasTitle, hasServiceComment as well as hasDocument resulted in the outcomes described in the following paragraph.
More specifically, we managed to find matches in all fifteen service activities examined from the Hungarian PSC when we used keywords taken from service provision activity (profession)’s title. Matching also varied according to the service provision activity examined. More specifically, we found matches when we expanded our keywords taking information from hasServiceComment and hasDocument predicates for the following service provision activities: (1) activity directed at the organization of vocational examination, (2) the attestation activity of contributing entity involved in the preliminary vehicle identity check, (3) practicing as a private veterinarian, (4) the registration of economic organizations and their shops engaged in the trade of precious metal, jewelry, articles and ornaments, (5) statement of eligibility to practice architectural design engineering, (6) statement of eligibility to practice as a construction engineer inspector. Those service provision activities coincided with the seven found as present in both PSCs during our manual search.
Table 2 provides examples of the matches returned by querying Greek and Hungarian PSCs showing how queries are expressed in SPARQL, while Table 3 provide all queries conducted with all variations of keywords as well as matching results. The same queries were performed in the rest of the PSCs, i.e., the Slovenian, the Maltese and the Cypriot one and for the same seven service provisions focusing on the same attributes, i.e., hasTitle and hasServiceComment. Table 4 provides the corresponding matches.
Table 2. Examples of queries and matches returned by the Greek and the Hungarian PSC.
Table 3. Queries and matches returned by the Greek and the Hungarian PSC, keywords used and attributes in which queries were applied.
Table 4. Queries and matches returned by the Maltese, Cypriot and the Slovakian PSC, keywords used and attributes in which queries were applied.
From the obtained results we reached the following conclusions: while hasDocument turns out to be the least useful predicate, the most valuable ones turns out to be hasTitle and hasServiceComment. On the other hand, hasService is not as reliable as hasTitle, since it returns non-relevant documents as matches. It must be stressed that the results obtained after performing the sparql queries using specific keywords, as well as performing on site queries in both PSCs, are very similar. This means that the selection of the keywords for search was similar to the one used by the search engine implemented in all sites, even though all five PSCs and every service provision activity’s (profession) description keyword information were missing.
Even though we noticed a similarity between the Slovenian and the Greek PSC in terms of attributes such as cost, legal basis, etc., matches could only been achieved by using keywords originated from the title as well as the hasDocument appearing as the attachments attribute. The hasServiceComment was mapped to the original description provided for each service provision since no dedicated attribute providing comments exists. Slovenia PSC proved to achieve the best matches. In the Cypriot PSC, search was performed on the title as well as on the general information field that was mapped to hasServiceComment. The search resulted in a number of incorrect answers, since it returned articles that belong to News or Posts meaning pages not describing service provision details. It must also be stressed that for each query performed, the Cyprian web portal returned a maximum of 10 results. The Maltese PSC is the one with the worst performance. This is due to the complexity of its structure. For the Maltese PSC, we made the decision to extract content appearing in the first level. Additionally, in the source code of each web page one can detect a type of services classification similar to that of NACE codes used in the Greek PSC. An example of this is “Child-Care-Facility” or “School Licences”. While performing manual search on the web, this classification returned false results which may be due to the aforementioned classification. Finally, the maximum returned results when performing manual search were 45.
The obtained results are in alignment with the results obtained when performing the same queries on the search field in each portal. It must be stressed that search functionality in Greek PSC is case sensitive. Moreover, since we are not aware of which keywords are used to describe each page in the search functionality, each query performed in the Greek PSC portal resulted in more than one page, some of them being incorrect. However, the correct one(s) always appeared in the query’s outcome.

5. Conclusions

In this paper, we presented work concerning the transformation of publicly available information regarding service activity provision enforced by Directive 123/2006/EC to Linked Open Data by following the Go-FAIR principles. The purpose of performing those experiments was triggered by two factors. The first was the fact that data published as open by public administrations remains poorly exploited (which was not the intention of service directive, Directive 123/2006). The second factor is that the forthcoming Single Digital Gateway Regulation, i.e., Regulation (EU) 2018/1724, is going to be based—among other things—on the outcome of the Service Directive. At the current moment, Single Digital Gateway Regulation focuses on 21 procedures (http:data.europa.eu/eli/reg/2018/1724/oj (accessed on 10 February 2022)). However, the number of procedures might rise after 2023, when the Regulation will be implemented. Thus, the focus was not on jobs’ details across PSCs but on common information residing in PSCs.
The aim of current work was to highlight the possibility of a commonly agreed upon ontology to ensure semantic interoperability, and the transformation from each PSC of its internal representation to the common ontology in order to achieve information exchange and interoperability between PSCs. However, our work proved that this ontology does not yet exist. This is the reason why the Single Digital Gateway Regulation services model is being created. It must be stressed that the aim of this paper was to tackle a real-world problem dealing with real data and cross-border exchange information. Our ambition was to provide a solution of data mapping, however, due to data structuring heterogeneity this proved not to be feasible. Our findings prove the importance of semantic interoperability in e-Govermental portals. Semantic interoperability is the one way to ensure the exchange of information, thus it can not only exploit publicly available data but improve service provision to citizens and businesses.
This must be taken under consideration in forthcoming actions whose purpose is—among other things—to provide information on online and offline procedures and links to online procedures, established at the European Union or national level via a common user interface, which will be accessible in all official languages of the European Union.
Information residing in PSCs can be addressed to a number of recipients, i.e., European citizens. Ideally, in order to be better exploitable, this information must follow a uniform semantic representation such as json-ld, containing annotations based on a common semantic representation, i.e., ontology. ESCO ontology might be used as a starting point for this purpose, since it contains a related class, i.e., occupation. A combination of more than one of these ontologies, vocabularies or models such as CPSV-AP 3.0, ESCO as well as the Core Criterion and Core Evidence vocabulary (https://semiceu.github.io/CCCEV/releases/2.00 (accessed on 10 February 2022)) or ISA2 Core Vocabularies (Core Person, Core Organization and Core Business (https://joinup.ec.europa.eu/collection/semantic-432interoperability-community-semic/core-vocabularie (accessed on 10 February 2022)) can also prove to be beneficial.
Our vision for future work consists of four steps. The first step aims to create appropriate links, such as URI’s, between pages belonging to our core dataset, taking under consideration that for a number of pages their translated version in an English web page exists. The second step is to examine other tools to perform transformations and semantic enrichments, such as the ones proposed by the STIR data project. The third step involves the interconnection of our dataset with the equivalent ones found in the dedicated sites of other European countries, i.e., the creation of interconnection links between sites focusing on the English content of each site. In order to perform this, DBpedia ontology (http:www4.wiwiss.fu-berlin.de/dbpedia/dev/ontology.htm (accessed on 10 February 2022)) can be considered as a common ontology to use. We are also considering the creation of a tool in order to transform information residing in Greek PSCs into JSON-LD, as an upper layer, both using the Enriched Greek E-gif ontology as well as CSPV-AP in order to be used by other PSC’s.

Author Contributions

Conceptualization, P.F.; methodology, P.F.; validation, P.F. and L.M.; formal analysis, P.F.; investigation, P.F.; resources, P.F.; data curation, P.F.; writing—original draft preparation, P.F.; writing—review and editing, L.M.; visualization, P.F.; supervision, L.M.; project administration, P.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MDPIMultidisciplinary Digital Publishing Institute
DOAJDirectory of open access journals
TLAThree letter acronym
LDLinear dichroism

References

  1. Fragkou, P.; Kritikos, N.; Galiotou, E. Querying greek governmental site using sparql. In Proceedings of the 20th Pan-Hellenic Conference on Informatics, Patras Greece, 10–12 November 2016; pp. 1–6. [Google Scholar] [CrossRef]
  2. Fragkou, P.; Galiotou, E.; Matsakas, M. Enriching the e-GIF ontology for an improved application of linking data technologies to greek open government data. Procedia-Soc. Behav. Sci. 2014, 147, 167–174. [Google Scholar] [CrossRef]
  3. Loutas, N.; De Keuzer, M.; Tarabanis, K.; Alvarez-Rodriguez, M.; Burian, P. Harmonising the Public Service Models of the Points of Single Contact Using the Core Public Service Vocabulary Application Profile; Innovation and the Public Sector; IOS Press: Amsterdam, The Netherlands, 2015; Volume 22, pp. 277–284. [Google Scholar] [CrossRef]
  4. Gerontas, A.; Tambouris, E.; Tarabanis, K. On using the core public sector vocabulary (CPSV) to publish a “citizen’s guide” as linked data. In Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age, Delft, The Netherlands, 30 May–1 June 2018; pp. 1–6. [Google Scholar]
  5. Gerontas, A.; Peristeras, V.; Tambouris, E.; Kaliva, E.; Magnisalis, I.; Tarabanis, K. Public Service Models: A systematic literature review and synthesis. IEEE Trans. Emerg. Top. Comput. 2019, 9, 637–648. [Google Scholar] [CrossRef]
  6. Karamitsos, I. Chatbots for Greek/EU Public Services. Ph.D. Thesis, International Hellenic University, Thessaloniki, Greece, 2019. Available online: https://repository.ihu.edu.gr//xmlui/handle/11544/29338 (accessed on 10 February 2022).
  7. Tzagkarakis, E.; Kondylakis, H.; Vardakis, G.; Papadakis, N. Ontology based governance for employee services. Algorithms 2021, 14, 104. [Google Scholar] [CrossRef]
  8. Vassilakis, C.; Lepouras, G. Ontology for E-government public services. In Encyclopedia of E-Commerce, E-Government and Mobile Commerce; Khosrow-Pour, M., Ed.; IGI Global: Hershey, PA, USA, 2006; pp. 865–870. [Google Scholar] [CrossRef]
  9. Fonou-Dombeu, J.V. Ranking E-government Ontologies on the Semantic Web. In International Conference on Electronic Government and the Information Systems Perspective; Springer: Cham, Switzerland, 2020; pp. 18–30. [Google Scholar]
  10. Fonou-Dombeu, J.V.; Viriri, S. OntoMetrics evaluation of quality of e-government ontologies. In International Conference on Electronic Government and the Information Systems Perspective; Springer: Cham, Switzerland, 2019; pp. 189–203. [Google Scholar]
  11. Melo, D.; Rodrigues, I.P.; Koch, I. Knowledge Discovery from ISAD, Digital Archive Data, into ArchOnto, a CIDOC-CRM based Linked Model. In Proceedings of the 12th International Conference on Knowledge Engineering and Ontology Development (KEOD 2020), Budapest, Hungary, 2–4 November 2020; pp. 197–204. [Google Scholar] [CrossRef]
  12. AlSukhayri, A.M.; Aslam, M.A.; Arafat, S.; Aljohani, N.R. Leveraging the Saudi Linked Open Government Data: A Framework and Potential Benefits. Int. J. Mod. Educ. Comput. Sci. 2019, 11, 14–22. [Google Scholar] [CrossRef]
  13. Sinaci, A.A.; Núñez-Benjumea, F.J.; Gencturk, M.; Jauer, M.L.; Deserno, T.; Chronaki, C.; Cangioli, G.; Cavero-Barca, C.; Rodríguez-Pérez, J.M.; Pérez-Pérez, M.M.; et al. From raw data to FAIR data: The FAIRification workflow for health research. Methods Inf. Med. 2020, 59, e21–e32. [Google Scholar] [CrossRef] [PubMed]
  14. Lim, S.; Seo, T.; Lee, C.; Shin, S. Study on the international standardization for the semantic metadata mapping procedure. In International Conference on Database Systems for Advanced Applications; Springer: Berlin/Heidelberg, Germany, 2012; pp. 243–249. [Google Scholar]
  15. Lin, X.; Khoo, M.; Ahn, J.W.; Tudhope, D.; Binding, C.; Massam, D.; Jones, H. Mapping metadata to DDC classification structures for searching and browsing. Int. J. Digit. Libr. 2017, 18, 25–39. [Google Scholar] [CrossRef]
  16. Godby, C.J.; Smith, D.; Childress, E.R. Two paths to interoperable metadata. In Proceedings of the International Conference on Dublin Core and Metadata Applications, Seattle, WA, USA, 28 September–2 October 2003; pp. 19–27. [Google Scholar]
  17. Hegg, K.J.; Knab, A.R. Using Dublin Core to facilitate cross-collection searches in an enterprise image repository. In Proceedings of the International Conference on Dublin Core and Metadata Applications, Seattle, WA, USA, 28 September–2 October 2003; pp. 233–234. [Google Scholar]
  18. Heflin, J.; Hendler, J. Semantic Interoperability on the Web; Technical Report; Maryland University College Park Department of Computer Science: College Park, MD, USA, 2000. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.