Digital Service Quality Measurement Model Proposal and Prototype Development

Erhan Sur
1,* and
Hüseyin Çakır
Department of Computer Tecnologies, Gerze Vocational School, Sinop University, Sinop TR57600, Türkiye
Department of Computer and Instructional Technologies Education, Gazi Faculty of Education, Gazi University, Ankara TR06500, Türkiye
Digital Economy Research Center, Azerbaijan State University of Economics (UNEC), Baku AZ1001, Azerbaijan
Sustainability 2024, 16(13), 5540;
Submission received: 14 May 2024 / Revised: 17 June 2024 / Accepted: 26 June 2024 / Published: 28 June 2024
Traditional service quality models, which are survey-based methods, have been noted by researchers to contain operational errors in their application. Researchers criticize service quality models such as SERVQUAL and SERVPERF for containing operational errors, high implementation costs, and the issue of response recall. Additionally, these models face difficulties when applied to different sectors, as they were developed for the retail industry. The adaptation of the model, data collection, and processing have become outdated in comparison to current information processing technologies. With the rise in the use of social media, new communication paradigms have emerged. In this new paradigm, direct communication is established between people and institutions through social media. Institutions analyze social media data using text mining and sentiment analysis methods to keep up with this change. There are studies in the literature proposing new methods for measuring service quality by separately using text mining and sentiment analysis techniques. In this study, these two techniques have been combined. It is believed that combining these two techniques will result in a more robust service quality measurement model. Additionally, an application has been developed to demonstrate the functionality of the model. A municipality was specifically chosen as the application area because social media allows for fast, efficient, and inclusive participation between citizens and the municipality. The proposed model will enable the better identification of service quality deficiencies, leading to a more efficient use of municipal resources and fostering a more sustainable understanding of the municipality. With the implementation of the model, 463,886 tweets sent to the @ankarabbld and @mavimasa accounts were analyzed to identify 10 service quality dimensions and 106 keywords representing these dimensions, which would reveal the municipality’s service quality. The sentiment analysis technique was applied to 187,084 tweets containing the identified keywords. Thus, an attempt was made to uncover the municipality’s service quality.

1. Introduction

Service quality studies began within the framework of total quality management in the 1970s [1], but the emergence of pioneering studies took until the 1980s [2,3,4]. From a bibliometric perspective, the most frequently used models in service quality studies are found to be SERVQUAL models. These models are criticized for their structuring according to the retail sector, making their application in different sectors problematic. Additionally, it is argued that the conceptualization and measurement of service quality rely on faulty paradigms [4]. The models developed by researchers in the 1980s typically employed survey techniques. The sample size, response rate, reliability of the response rates, and operational errors during implementation make these models disadvantageous [5], and they are considered unsuitable for contemporary data processing technologies.
There is a need for a more effective method than traditional service quality models to understand people’s desires and evaluate the services provided. Considering the data generated by social media usage and current information processing technologies, it is believed that a new service quality measurement model can be proposed. Service quality studies utilizing a sentiment analysis in the literature began in 2015 [6]. Notable research comprises studies conducted in hotel businesses, airlines, transportation, banking, and healthcare sectors [7,8,9,10,11,12]. These studies have not integrated the extraction of keywords using text mining from user reviews with a sentiment analysis. A new model is proposed where the opinions of service users are considered and evaluated through a sentiment analysis by combining these two techniques. The motivation for this study is based on the belief that a more robust and contemporary information processing, technology-compatible model can be developed by identifying service quality dimensions from data related to the relevant sector using text mining techniques and performing sentiment analysis on the data containing these keywords. The research aims to find an answer to the questions, “How to develop a service quality measurement model with sentiment analysis and text mining?” and “How to Develop a Prototype for Service Quality Measurement”.
The methodology employed to achieve the research objective is illustrated in Figure 1.
To achieve the objective of the study, a municipality was selected to ensure data volume and demonstrate the accuracy and applicability of the model. Virtual networks created through information processing technologies enable citizens to rapidly form new public spaces and actively participate in decision-making processes within these newly formed public areas [13]. Through virtual networks such as X, citizens quickly create new public spaces. The platform where people in regional communities can interact and share information, experiences, and mutual benefits is called a digital city. A digital city is a social information infrastructure for urban life [14]. In the public sector, social media provides fast, efficient, and inclusive engagement channels for citizens to dialogue with public officials. This is particularly evident at the municipal level, where there is a high degree of proximity between citizens and the government and a high likelihood of citizens participating in public affairs. Engaging citizens through social media offers municipalities low-cost opportunities for better public relations, increased accountability and transparency, and the ability to implement collective problem-solving in policy-making [15]. Social media sites such as X, which facilitate reciprocal communication, act as a quick, effective, and low-cost bridge between digital city citizens and public administrators. The number of people using social media to express opinions about government policies or local administrators is increasing daily. For these reasons, it is believed that a municipality would serve as a good prototype in the proposed new model. In this study, a service quality measurement model is proposed, which involves extracting keywords from social media data and conducting a sentiment analysis on posts containing these keywords, regardless of the sector. A prototype has been developed and applied to a municipality as a case study.

2. Conceptual Background

The theoretical framework of the research consists of the concepts of service quality, social media, sentiment analysis, and text mining. For this reason, these concepts will be introduced first. While explaining these concepts in the research, we will try to reveal the relationship between the images.

2.1. A Brief Overview of Service Quality and Municipal Service Quality

Studies on municipal service quality began approximately 20 years after the initial studies on service quality [16,17,18,19]. Despite using traditional models, such as SERVQUAL and SERVPERF, in these studies, researchers mention different service quality dimensions, different types of services, and different scale questions to present service quality for municipalities. Similarly, in studies conducted in Turkey, it has been observed that the scale questions differ in the adaptation of the SERVQUAL model [20,21,22,23]. A study conducted by examining the literature with municipal employees and experts, considering the municipal law, also identified different quality dimensions [24]. The challenges and costs associated with applying traditional service quality models, along with operational errors, the specification of different service types instead of the five service dimensions in municipal service quality measurement, and the use of different scale questions indicate a lack of consensus among researchers. Additionally, it is believed that the scale questions derived from the adaptation of traditional models do not encompass all types of services provided by the municipality. For these reasons, it is believed that the service quality scales in the literature cannot fully analyze the services provided by municipalities. The types of services provided by municipalities vary depending on the countries they belong to. For example, in the United States, education, healthcare, and police services are managed by municipalities, whereas this may differ in European countries. Therefore, when determining service quality dimensions, it is important to consider citizens’ opinions and identify their expectations.

2.2. Text Mining and Keyword Extraction

Data can be recorded, rearranged, analyzed, and represented by signs such as symbols, letters, and numbers obtained from facts through reason, discussion, or calculations [25]. Text mining is an application area of data mining [26]. Text mining extends data mining to textual data [27]. Text mining is an inherently interdisciplinary process involving collaboration between individuals with various specializations ranging from technical sciences to the humanities [28].
The development of text mining started in the early 2000s and has brought new ideas to existing disciplinary fields. It processes large amounts of natural language textual data in computer files to structure their content and themes and make inferences [29]. Text mining techniques consist of data collection, structuring, and mining processes. This process is described in Table 1.
Keywords are a sequence of one or more words that represent a text’s content. Keywords mean the primary content of a text in a summarized form. Keywords are widely used to define queries within the knowledge extraction method, one of the text mining methods, as they are easy to identify, review, remember, and share [30].
Jones and Paytner [31] developed a system that lists documents related to keywords and hyperlink keyword links between documents, allowing users to access content quickly. Similarly, Gutwin and Paytner [32] developed a system for identifying keywords in a copy. They enriched the presentation results with keywords. Keywords can be extracted with the statistical evaluation of the texts in the document. This is performed by comparing word frequency distributions in a text with distributions in a reference corpus. Selecting statistically distinctive words for an index dictionary has positive results [33]. Some keywords are unlikely to be statistically outstanding in the corpus. Word combinations may need to be considered when statistically corpus building. A syntactic filter should be used to select keywords; words frequently repeated but meaningless should be excluded [34]. Once possible keywords have been identified, keyword associations should be revealed. The relationships of possible keywords can be shown graphically or with matrix tables. Keywords are determined by calculating the frequency of keywords in the document and the degree of association with other possible keywords. Software such as MAXQDA can process textual data, perform qualitative data analyses, and provide a linguistic approach by evaluating metrics for opinion extraction. It can organize and categorize unstructured data, search for information, test theories, and create illustrations [35]. Figure 2 and Figure 3 present the keyword association matrix and the possible keyword study identified by the researchers [30]. The numbers within the matrix indicate how many times the words co-occur. Based on the co-occurrence matrix, the probability of a word being a keyword is determined by dividing the frequencies of the words by the number of different words they co-occur with.
MAXQDA software encodes, classifies, maps textual data, and identifies text relationships [36]. Various software, such as CAQDAS, MAXQDA, Quirkos, Transana, etc., are used in computer-aided text analysis. Computer-aided software is preferred in text mining due to its simplicity and the techniques it incorporates [37]. This study selected MAXQDA Analiytic Pro 2022 software while extracting keywords using the same methods.

2.3. Municipalities and Text Mining

When the terms “municipality” and “text mining” are searched in the Web of Science (WOS) database, a study conducted by [38] can be seen, which involves the qualitative analysis of relevant records of Japanese municipalities with certified geoparks using the text mining method. In the Scopus database, there is also a study by Kohsaka and Matsuoka (2015) and a study by [39] that present a framework based on information technology and text mining to reveal satisfaction with Tehran municipal services. Apart from these databases, the following studies are found in the literature.
Researchers indicate that, compared to surveys and traditional research methods, the large amount of social media data with time and location information published by ordinary people and the automated tools available to process these data support urban planning decision-making in a broader spatiotemporal context. For this reason, they analyzed WeChat and X data for the Institute of Urban Planning and Design of Beijing Municipality using the text mining method to help institute managers improve their social perception and expectation abilities and support their decision-making processes [40]. A method based on computer-aided text mining was used to evaluate the climate action plans of 16 municipalities in the regional center of the state of Saxony, Germany. The researchers preferred text mining because it combines qualitative and quantitative approaches. Thus, they claimed that they overcame the time efficiency limit for large case numbers [41]. With the acceleration of urban development and increasing population, the amount of solid waste that the municipality has to handle has also increased significantly. To recycle solid waste more efficiently, a mandatory policy for solid waste segregation has been introduced in China. To examine the attitudes towards this decision taken by the municipality, researchers analyzed the comments of Sina Weibo users. They used the text-mining method to analyze the comments. The researchers argued that public sentiment studies could serve policymakers and practitioners as a political guide in social development [42]. Considering these studies in the literature, it is seen that the text mining method is used for municipal service quality and is also beneficial to citizens and administrators.

The Role of X in Text Mining (X Formerly Known as Twitter)

In many recent studies, X has been used as a resource in various fields, such as predicting political preferences, determining the effectiveness of a service policy, and monitoring infectious diseases and public health crises [43]. In a 2014 study, it was reported that X is the most popular social media application in terms of use by local governments and generates high activity in terms of users liking content or following accounts [44]. In the context of public administrators, X is used to engage stakeholders in dialogue and build relationships. X provides unique features for public administrators [45]. Municipalities must ensure citizens are involved in decision-making processes for practical city management. For this reason, X data should be analyzed and classified for strategic use [46]. Local governments do not systematically evaluate the messages written to them on X. They do not have a strategic method for using X. X has become the focal point of information retrieval and text mining due to the large amount of unstructured textual data it generates [47]. The study’s data source is X in terms of the suitability of its usage characteristics.
X, which allows people to share information anytime and anywhere, has become a platform where users evaluate the service they receive. With mobile applications and internet infrastructure development, X allows for sharing anytime and anywhere. The fact that user content is more prosperous than the data collected with traditional methods based on surveys has made X a valuable source of information [48]. The size of these data and the fact that they are composed of natural language causes various difficulties in analyzing it. Researchers use artificial intelligence techniques, such as sentiment analyses and text mining methods, to overcome these difficulties in processing social media data [49].

2.4. Sentiment Analysis and Its Use in Service Quality

A sentiment analysis analyzes people’s opinions, emotions, evaluations, appraisals, valuations, attitudes, and feelings about products, services, companies, individuals, tasks, events, topics, and their characteristics [50]. Similarly, product managers use sentiment analyses to improve user experiences and satisfaction scores and analyze product and service quality [51]. A sentiment analysis or opinion mining is used to accurately predict whether a tweet is positive, negative, or neutral [52]. Recent years have seen a tremendous increase in sentiment analysis applications. Organizations and companies leverage these opinions to build their customer base or gain insight into their services and products. Thus, sentiment analysis applications have spread to almost every field, from product reviews to health services, stock market forecasts, political strategies, and elections [53].
In a sentiment analysis, it is first necessary to determine whether the sentence is objective or subjective. While no action is taken in objective sentences, the polarity of the sentence as positive, negative, or neutral should be determined in personal sentences. In subjectivity classifications, the process of separating sentences expressing factual information from sentences describing personal opinions and ideas is carried out. For example, the sentence “This is a telephone” is an objective statement. “This phone is perfect” is a subjective statement. There are three levels in a sentiment analysis: document, sentence, and unit and feature levels [54]. At the document level, an inference is made for the whole document. At this level of analysis, the entire document is considered to express an opinion about a single entity. In a sentiment analysis, at the sentence level, each sentence is evaluated as positive, negative, or neutral. At the unit and feature level, sentiments are considered to have a target [55].
Microsoft Azure is one of the leading cloud service providers. It first started to provide services in February 2010. It serves its users as a service platform and infrastructure [56]. Microsoft Azure Machine Learning covers cloud services that enable the creation, deployment, and management of applications by developers through a global data center network for Microsoft [57]. This service provided by Microsoft also supports multiple machine-learning algorithms related to regression, classification, and clustering. It allows for the customization of models using Python and R. In addition, it will enable modules and datasets to be designed using the drag-and-drop method. In this way, users can create their models [58]. Azure Sentiment Analysis is a platform that enables sentiment analyses at the document level, at the sentence level, and for specific features within the document, provided by Microsoft Language Services, which uses machine learning techniques in cloud architecture for sentiment analyses. Azure Sentiment Analysis is used in various applications, such as social media monitoring, customer feedback analyses, and product review analyses. Microsoft reports that Azure Sentiment Analysis has reached 90% accuracy.
Microsoft Azure Language Services was chosen for this study due to its success in sentiment analyses, as noted by researchers.
To identify service quality studies using the sentiment analysis method, the search query ‘TI = (“sentiment analysis”) AND TS = (“service quality”) AND DT = (Article)’ was used in the WOS database. With this query, 23 articles were found. As can be seen in Figure 4, these studies were conducted between 2016 and 2022. Considering the number of studies and the starting year, it is seen that the use of a sentiment analysis in service quality is still very new. Service quality is used in many areas, especially the private sector, and its importance is increasing in a competitive environment. The low number of studies on the use of sentiment analyses gives opportunities to researchers in this field [59].
When the disciplines in which studies using a sentiment analysis in service quality studies are examined, it is seen that the fields of management and business come to the forefront. Figure 5 shows the discipline distribution of the identified studies.
Table 2 presents 23 studies in which a sentiment analysis measured service quality [59].
Upon examining the studies presented in Table 2, it is evident that researchers have employed sentiment analysis techniques for measuring service quality across different sectors. They have utilized social media data or user reviews as their data sources. The results of sentiment analyses indicate that high service quality scores have a positive impact on sales. Researchers have highlighted the inadequacies of traditional models in measuring service quality and aimed to overcome these shortcomings. They have suggested that sentiment analysis techniques can provide a more comprehensive and viable alternative to survey-based models.
In the digitalized lifestyle, people express their feelings in writing on social media platforms, websites, blogs, and forums. This situation also affects research in social sciences. The sentiment analysis method was born in computer science and has developed in different disciplines. Efforts to reach service quality by analyzing social media content with a sentiment analysis have been put forward in various sectors. Considering the size of the data analyzed in the studies presented in Table 2, the volume of data to be analyzed in this study will also address a different aspect compared to the existing studies in this field. Additionally, this study proposes a more robust model by utilizing both text mining and sentiment analysis techniques together. Moreover, no model specifically developed for municipalities has been observed in the literature. Therefore, a municipality has been selected as the application area for this study, making the research’s application aspect innovative.

Evaluation of Sentiment Analysis Model Success

The success of the proposed model is related to how accurately the content sent by users is labeled. Therefore, confusion matrices have been used to demonstrate the model’s success. Confusion matrices are the primary tool used in supervised machine learning to evaluate errors in the classification problem. Confusion matrices give the number of misclassified items by comparing the predicted classification results with the previously known results in supervised learning. When the result of a sentiment analysis is labeled as positive when it should be labeled as negative, it is called a type 1 error. When the result is marked as negative when it should be labeled as positive, it is called a type 2 error. A type 1 error is labeled as “false positive, FP”, and a type 2 error is marked as “false negative, FN”. Positive labels predicted without error are represented by “true positive, TP”, and negative brands are represented by “true negative, TN” [76]. According to the labeled data, the accuracy, precision, sensitivity, and F1 value, which shows their harmonic mean, are calculated; these values reveal the model’s success [77].
A c c u r a c y ( A ) = T P + T N T P + T N + F P + F N
P r e c i s i o n ( P ) = T P T P + F P
R e c a l l ( R ) = T P T P + F N
F 1 D e ğ e r i = 2 . P . R P + R

3. Methodology

The proposed model is presented in Figure 1. According to Figure 1, the proposed model includes the steps of selecting the data source, extracting keywords using text mining, and applying sentiment analysis. In accordance with these steps, X was chosen as the data source. The role of X in service quality measurement is discussed within the conceptual framework. MAXQDA Analiytic Pro 2022 was used for keyword extraction from the obtained data. Microsoft Azure Language Services was utilized for sentiment analysis due to its high-performance results. After identifying the keywords, the service dimensions containing these keywords were determined. By analyzing the number of tweets containing the keywords, the importance of each service dimension was established. A web interface was developed to display the results. These steps are shown in Figure 6.
In the proposed service quality measurement model, sentiment analysis was performed simultaneously by Azure Sentiment Analysis as the data were obtained, thanks to the developed integration. This feature provided flexibility in the implementation of the model.
After the obtained tweets were sentiment analyzed, the confusion matrix proposed by [77] was created to demonstrate the model’s success.

4. Development of the Prototype

A prototype was developed to test the functionality of the theoretically proposed service quality model. The municipality was chosen as the area to apply the prototype due to the absence of a study combining sentiment analysis and text mining in the literature. X was selected as the data collection platform because it facilitates communication between citizens and municipalities and serves as a suitable source for applying text mining and sentiment analysis. Sentiment analysis was applied to the collected tweets and the extracted keywords, and a web interface was developed to observe the results. This section will describe these steps.

4.1. Data Collection

To create a rich data source in the research, it was considered appropriate to choose a metropolitan municipality. This municipality was preferred because it is the capital of Turkey, and the “City of Mayor” award was given to the Ankara Metropolitan Municipality in 2021 thanks to its service quality. In the research, the data sent to @ankarabbld and @mavimasa X accounts between 1 January 2016 and 30 April 2023 were collected to provide a rich and long-term dataset. Figure 7 shows the data collection process.
Tweepy library was used in the program. The X application key was used in the Python programing language. The program that enables the data to be obtained is given in Table 3.

4.2. Data Processing

Cognitive Service resource, Azure Language Service, and Language Storage Service should be created to process the obtained tweets with Azure Sentiment Analysis. After the Cognitive Service resource is created, the Language Storage service must be designed to save the data from X in the cloud storage. Microsoft Azure Language Services perform tagging, as shown in Table 4, when analyzing sentiment. If there is more than one sentence in a tweet, it scores each sentence separately according to its sentiment and labels it according to the final overall average. Scoring is measured between 0 and 1 according to how strongly the feeling expresses that label.
Table 5 shows the sentiment analysis example labeling.

4.3. Keyword Extraction

Keywords were determined by text mining from the collected data to determine the service quality dimension. Keywords are determined by calculating the frequency of keywords in the document and the degrees arising from their association with other possible keywords [30]. Keywords were obtained with the MAXQDA Analiytic Pro 2022 program. For the keywords to give better results, link addresses inserted in the text, X tags, words with less than three characters, and 3198 words that have a high frequency but are not related to service quality, which are written to create an agenda on X and to advertise, which will create interference, were excluded. After excluding these words, 1,576,997 words and 12,286 different words were identified.
While determining the keywords, their single frequencies; their double, triple, and quadruple word combinations; their usage in context; and their relations with each other were arranged within the framework of the Metropolitan Municipality Law, T.R. Official Gazette, 25531, Law No. 5216 dated 23 July 2004, which determines the powers and responsibilities, establishment, organs, management, and working procedures and principles of municipalities, and the municipal service dimensions put forward in the studies conducted by [22,23,24] mentioned in the literature.
While identifying keywords, the method described in the study by [30] was taken into consideration. Since Turkish is an agglutinative language, word roots were identified, and their frequencies were combined. Service quality dimensions were established by mapping the relationships between the words, as shown in Figure 8, and providing co-occurrence matrix, as seen in Table 6. Due to the collected data being in Turkish, the keywords in the code maps appear in Turkish. Their English equivalents are given in Table 7. This section only presents the street–road–high street maintenance and repair service. Text mining processes for other service quality dimensions are provided in Appendix A.

4.4. Designing Web Interface

A web interface was added to the measurement model to observe the service quality dimensions. This way, relevant keywords can be searched in tweets, and the sentiment analysis tags and values received by tweets can be checked. Thus, the measurement model can be quickly adapted for different sectors. It is thought that searching the constituent keywords among the tweets obtained, obtaining sentiment analysis scores, averaging the scores, determining the number of tags, determining user authorization, uploading and deleting tweets, and changing the database when desired will make it easier for the web interface to adapt to different sectors. Therefore, these features were added to the web interface.

4.5. System Infrastructure Architecture

The system architecture was Linux-based. Ubuntu 20.04 operating system was installed on Linux 5.4.0–153 kernel. Apache plugin was used for the operating system to become a web server, MySQL 8.0.34 was used for database management, and PHP 7.4 programing language was used for the scripts in the web pages to execute user requests on the server side.

5. Findings

During the data collection phase, 463,886 tweets were sent to the @ankarabbld and @mavimasa X accounts between 2016 and 2022. When text mining was applied to these tweets, 10 service dimensions and 106 keywords representing these dimensions were identified. There are 187,084 tweets containing 106 keywords. Table 8 shows the service dimension and the keywords. Table 9 shows the municipal service quality dimensions and the number of tweets containing keywords related to these service dimensions.
Since the keywords representing the service dimension are directly extracted from user-generated content through text mining, they are believed to better represent the service quality dimensions than the keywords identified in studies measuring service quality using a sentiment analysis, such as those presented in [8,11,60,64,72]. Additionally, it has been observed that in service quality measurement studies in the literature, such as [7,8,9,10], the dimensions of the SERVQUAL model have been adapted as service dimensions. One benefit of the keywords generated from text mining is that they allow for the creation of more detailed, sector-specific service quality dimensions. In this regard, these findings are considered to be stronger in measuring service quality. Table 9 provides the service dimensions and the number of tweets containing keywords related to the service dimensions.
The number of tweets subjected to a sentiment analysis for measuring service quality is provided in Table 9. Considering the volume of data analyzed in the studies mentioned in Table 2 such as [6,9,11,64,67], it can be said that the data handled in this study are significantly larger. Analyzing more data helps to reduce the impact of type 1 and type 2 errors on the model’s success in a sentiment analysis.
Table 10 presents the sentiment analysis results related to the service quality dimensions.
The factors that reveal the service quality after the keywords are searched in the web interface are given in Table 11.
According to the sentiment analysis, 47,152 negative tweets and 34,200 positive tweets were identified. The number of negative tweets is higher across all service dimensions except for fire department services, countryside development services, and water and sewerage services. Considering that citizens usually tweet about services they are dissatisfied with, it can be said that the high number of negative tweets is normal. Although 10 service dimensions have been identified for the municipality, it is observed that significantly more tweets are posted about certain service dimensions. Posting more tweets about a particular service dimension indicates its importance. For example, the number of tweets sent about funeral services is much lower than those about public transportation or social services, as funeral services are not used as frequently in daily life. Another benefit of extracting keywords in determining the service dimension is that it allows for identifying the types of services that are more important to citizens. The service dimension of the municipality is determined by the keywords that constitute it. Under a single service dimension, various types of services can be included. For example, social services not only encompass the maintenance of parks and gardens but also include courses offered by the municipality. This is also applicable to other types of services. In the proposed system, the developed web interface allows for the separate analysis of keywords. This enables detailed analysis within each specific service dimension.
The confusion matrix, which demonstrates the model’s success, was proposed by [77]. The matrix for the developed model is provided in Table 12.
According to this matrix, the accuracy value calculated is 0.86047, the precision value is 0.95556, the sensitivity value is 0.72881, and the F1 value is 0.82692. According to the data from the confusion matrix, it can be said that the sentiment analysis is quite successful.

6. Discussion, Conclusions, and Future Research

In this study, a new service quality measurement model is proposed using sentiment analysis and text mining techniques. To demonstrate the applicability of the theoretical proposal, a prototype was developed. The municipality was chosen as the area of application due to its suitability for service quality studies and the limited number of studies in this field within the literature. The selected municipality was deemed appropriate for the implementation of the model because it had received a service quality award and was a good data source. The first phase of the proposed new model involves selecting the data source for a sentiment analysis and text mining and obtaining the data. Since the municipality was chosen as the area of application and citizens directly communicate with the municipality via X, X was selected as the data source. A total of 463,886 tweets sent to the municipality’s X accounts between 2016 and 2022 were obtained. The second phase of the proposed model involves applying text mining techniques to the obtained data to extract service quality dimensions and relevant keywords. In this phase, the MAXQDA Analytic Pro 2022 software was used for the analysis. Since the tweets sent by users were not solely related to service quality but also included advertisements, marketing, and propaganda, 3803 words were excluded to avoid interference. When determining the keywords, word frequencies and their combinations in pairs, triplets, and quadruplets, as well as their co-occurrence, were analyzed. During these stages, word association maps and co-occurrence matrices were created. As a result of this process, 10 service quality dimensions and 106 keywords constituting these dimensions were identified. In the final phase, a sentiment analysis was conducted on tweets containing the identified keywords to determine the service quality of the municipality.
The SERVQUAL model, based on surveys, includes five service quality dimensions. This model was developed with 200 participants who received services from four sectors in the retail industry. In addition to the operational errors and adaptation challenges mentioned in the literature, the data collection and processing method also lags behind current technological advancements. For these reasons, researchers have proposed new service quality measurement models summarized in Table 2. These studies have discussed the usability of text mining and sentiment analysis techniques as recommendations. By using sentiment analysis to measure service quality in various sectors, researchers have compared it with traditional models. They have concluded that a sentiment analysis is an innovative and applicable method. In this study, the two techniques used in service quality measurement models in the literature have been combined to create a more robust approach. Additionally, with the developed prototype, desired keywords can be searched through the web interface, and relevant tweets can be observed. This allows for the independent evaluation of service quality dimensions and service types. More detailed results are obtained in identifying shortcomings related to service quality.
Researchers who measured airport service quality using a sentiment analysis and text mining [8] have indicated that this method is innovative and applicable. These researchers extracted keywords from data obtained from X based on an existing scale. In this study, however, keywords were extracted directly by analyzing the content submitted by users. Additionally, this study collected a larger dataset over a longer period.
The service quality dimensions identified through the application of the proposed model to the municipality have both similarities and differences with the dimensions presented in the literature on service quality studies. For instance, in the study conducted by [22], funeral services and fire services were not mentioned, while road and street maintenance and repair services and zoning services were combined. Similarly, in the study by [21], it was observed that road and street maintenance and repair service, funeral service, fire service, rural area service, and inspection service dimensions were not examined, whereas social services, physical characteristics, and zoning services were investigated. In the studies conducted by [17,18], only the five dimensions of the SERVQUAL and SERVPERF models were considered as service quality dimensions. The study by [19] showed that many service dimensions were combined under the dimension of city service management.
A web interface was designed to observe the analyzed data in the development of the service quality measurement model prototype. The web interface utilizes a Linux operating system infrastructure and a MySQL database. Within the web interface, the specified keywords can be searched both together and separately. This feature adds flexibility to the service quality measurement model, allowing for a detailed analysis related to the specific service dimension. This prototype can also be quickly adapted to different sectors. Once the data source for the sector in which service quality is to be measured is determined and the relevant keywords are extracted from the data, the desired analyses can be performed through the web interface.
By applying the proposed model to the municipality, key terms that are typically difficult to predict and not present in the literature have been identified. Some of these terms include words resulting from incorrect spelling, such as “otobus” (bus), “numarali” (numbered), “asvalt” (asphalt), and “harfiyat” (excavation). Additionally, there are colloquial terms used in spoken language, such as “köstebek” (mole), “çukur” (pit), “çamur” (mud), “yama” (patch), and “toz” (dust). Furthermore, terms related to the quality of service in geographical conditions, such as “tuz” (salt) and “küreme” (plowing), were also identified. It is believed that the method implemented will provide a new perspective to the literature in this regard. For a municipality located in a different geographical area where it does not snow during the winter season, citizens would not use the terms “salt” and “plowing”. Despite geographical differences or changes in legal responsibilities, the proposed model will overcome this situation through the steps to be implemented. Key terms derived from users’ posts and the dimensions of service quality will adapt accordingly.
With the implementation of the prototype, it has been observed that citizens tweet more about service quality dimensions that concern them the most. For instance, despite being a legally mandated service of the municipality, fire services are only utilized by citizens when they encounter a fire incident. Consequently, tweets about this service dimension are not as numerous as those concerning other service dimensions. From the perspective of municipal administrators, this result suggests that the prototype indicates a need for the municipality to prioritize more critical service quality dimensions. When considering this result for different sectors, it will enable managers to understand which units are more critical for the institution and which units have better or worse service quality. From the perspective of the applied prototype model, it can be stated that citizens are most concerned with street, road, and alley maintenance and repair services. Following these, social services, public transportation services, and water and sewerage services are of significant importance. Municipal authorities can take these results into account and plan the necessary actions to improve service quality.
In this study, the proposed model has been applied to a municipality, and it has been observed that certain service quality dimensions show similarities to other studies in the literature when considering service quality dimensions. However, service quality dimensions and key terms not encountered in other studies have also been identified. Researchers can apply the proposed model to municipalities in different geographical regions and compare the results. Similarly, researchers can apply the steps of the proposed model to different sectors to verify that the results are suitable for different sectors, as stated in this study, or to bring new suggestions for improvement.
Considering that citizens generally tweet to complain about service quality dimensions with which they are dissatisfied, it is naturally observed that the majority of the resulting posts are negative.

7. Limitations

The developed model proposes the use of social media tools as a data collection method, and the results of the developed prototype have been applied to the Ankara Metropolitan Municipality, the capital of Turkey. On social media, people might criticize municipal services for political reasons, even if they do not have genuine complaints, or conversely, they might express satisfaction without using the services. Although the developed prototype relies on the X platform as its data source, considering its high usage rates, it only takes into account those who express their satisfaction or complaints on this platform. While it is stated to have advantages over traditional models, it does not reflect the opinions of everyone who uses the service dimensions.
The predominance of dissatisfaction in the analyzed tweets may lead to an overrepresentation of negative sentiments, which can result in a misinterpretation of the findings.
Although the removal of certain words, such as advertisements, marketing, and propaganda, in text mining is thought to provide a more accurate representation of service quality dimensions, it may inadvertently exclude words relevant to service quality.
The use of MAXQDA Analytic Pro 2022 software in text mining and the keywords generated through this program are limited to the accuracy of the results provided by this software.
The implementation of the prototype and the interpretation of the results are limited by the data collection period (2016–2022) and the trends of social media data. Due to the nature of the language used on social media, which often does not adhere to grammatical rules, the accuracy of the sentiment analysis remains limited.

Author Contributions

Methodology, E.S.; Software, E.S.; Data curation, E.S.; Writing—original draft, E.S.; Writing—review & editing, H.Ç. All authors have read and agreed to the published version of the manuscript.


TÜBİTAK supported this study with project number 223K905.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflict of interest.

