1. Introduction
Public transport is often considered as the primary mode of mobility in developing cities due to its affordability compared to private automobiles and privately operated shared-ride transportation systems [
1,
2]. It has been observed that the service provided by the public transit systems in developing cities often do not meet the users’ demand and their expectations [
3]. The quality of public transport service has been modeled and attributed by various criteria, for example, network coverage, accessibility, affordability, safety, and cleanliness, to name a few [
3,
4]. However, the criteria are region-specific, which are also influenced by specific socio-economic and cultural setup, topography, and the level of governance. If the criteria do not fulfill users’ need, then that affects their perceived satisfaction level. Understanding users’ perceived satisfaction towards public transit system can help in improving the quality of service and performance measures. Information about users’ perceived satisfaction is often unavailable which leads to mismanagement and improper allocation of transportation supply owing to drop in quality of service. As a consequence, users experience difficulty in commuting with limited activity space. Users’ satisfaction information is primarily collected through manual survey [
3,
5], which often suffers from quality issues. The satisfaction information can also be collected through offline paper-based or online web-based questionnaire [
6]. However, manual surveys are expensive, time consuming, and often suffer from under-reporting or miss reporting. For example, respondents may not provide all the detailed travel information in a manual travel survey. The survey may have sampling bias. Due to close-ended design (specific questions in the questionnaire), lots of relevant information (related to users’ perceived experience) may not be captured in a manual survey process.
With the emergence of ubiquitous mobile platform and easy access to internet, people can share their opinions and reactions towards various events in the form of user-generated contents (UGC) [
7]. UGC are generally produced by the users on social media platforms in the form of text, image, or video. In this paper, we explore the potential of UGC to understand users’ perceived satisfaction towards public transit system. To understand this, we chose Mumbai suburban railway system in Greater Mumbai in India.
According to a report [
8] in 2018, the total number of social media users is 326.10 million in India, out of which Facebook accounts for 73% usage, followed by Instagram (20%) and Twitter (2.37%). In India, more than half (52.30%) of the social media users are young generation with an average age group of 27, followed by generation Z (28.40%), and the age group of 35–44 (15.80%). As a matter of fact, older generations account for only 4.20% [
8]. Although the socio-economic status of young commuters who are active on social media platforms is not readily available in Greater Mumbai, based on a study of (young) consumer behavior in Mumbai [
9], it has been observed that majority of the young generation are graduate to postgraduate with 32.70% has no monthly income, 19% earns less than 25,000 INR, 22% earns in the range of 25,000 to 50,000 INR, and 26% earns more than 50,000 INR. Eventually, a vast majority of the low income population will prefer public transit due to its affordability [
1]. Although there is a prominent gender inequality on Twitter (16% being female, 84% being male) [
8], there is a growing use of this platform to disseminate news and personal experience [
10]. For example, people can post about various events on Twitter in the form of micro-blogs, also known as tweets. The events can be related to political agenda [
11], disaster incidents [
12], or transportation systems [
13,
14]. Recently, there has been a growing trend to use Twitter to share traffic conditions, road conditions, and transport service quality in major metro cities in India. This helps users to adapt their travel plans in a more effective way. On the other hand, transport authority also gets to know current traffic conditions and infrastructure issues perceived by the users. Given that the majority of social media users are mostly young generation, there is a user bias on Twitter. However, as there is a growing use of sharing travel experience on Twitter, we aimed to investigate how users express their (dis)satisfaction towards public transit systems in Greater Mumbai by analyzing Twitter data to support transport authorities to improve the transport service quality.
In Greater Mumbai, people primarily use public transport service (bus service or suburban railway service) for 75% of the motorized trips [
15], with a higher mode share on railway system [
16]. Mumbai Suburban railway system caters to almost 7.5 million users on daily basis. On yearly basis, Mumbai railway service caters to 2.64 billion people in Mumbai, which makes it one of the busiest railway services in the world [
17,
18]. The travel demand for the railway service is generally very high, making the trains and platforms overcrowded, which often poses strain on the mobility supply and infrastructure. To meet such high travel demand, it is important to understand quality of service. Since Mumbai railway service has more users compared to its bus counterpart, we chose the railway service to understand how users react to public transit system in Mumbai. In this paper, the author possesses local (geographical) knowledge about Mumbai suburban areas. That said, past studies showed importance of local knowledge helps in more effective (geographic) information extraction and validation [
19]. Factors related to high travel demand for Mumbai railway system and the author’s local knowledge motivated the choice of Greater Mumbai as a study area and the Mumbai suburban railway service as the primary public transport mode system to be analyzed in this research. Due to very high travel demand, the Mumbai suburban railway system often experiences overcrowded compartments, mismanagement of service, and fatal accidents [
20].
Previous works investigated the feasibility of UGC to model travel demand [
21], usage of urban space [
22], and road traffic incident detection [
23]. To the best of our knowledge, only a few works investigated the feasibility of Twitter data to understand users’ satisfaction towards public transit system [
24]. That said, if it is known from the tweets that people are not satisfied due to delay or frequent cancellation of trains at a given location, then the authority can take proper measures to improve the service. However, the majority of the tweets do not have geotag information [
10]. People often mention a location context while reporting about a mobility issue. In the following tweets, we show how users report about various issues at different locations.
Tweet 1: Issue is not only at Parsik Tunnel for UP fast trains but at 1. before arrival at Kalyan station 2. Before arrival at Thane station 3. Before arrival at Dadar station. Most of the fast trains are stopping at above places & at Parsik tunnel which is causing delay. Please look into…
Tweet 2: Escalator of platform no 6 & 7 of BORIVLI Mumbai not working since long. Mail n express trains comes here. Passengers with luggage have to climb foot over bridge causes problems. Authorities look in to the matter.
Tweet 3: @se_railway 1802 from CST now crammed with passengers for 1823. Dangerously overcrowded train not yet left CST. Fucking nightmare.
In the first tweet (Tweet 1), the user reported about the delay of trains at Parsik tunnel. In the second tweet (Tweet 2), the user mentioned the malfunctioning of the escalator at Borivli with a wrong spelling. The location Borivli should be correctly spelled as Borivali. Sometimes a tweet may be grammatically incorrect or may have abbreviations. Such informal and noisy text leads to poor performance in simple lexicon-based lookup while retrieving the place name. In the third tweet (Tweet 3), a user reported a risk of boarding on an overcrowded train at CST, which is an abbreviation of Chhatrapati Shivaji Maharaj Terminus. The user also mentioned railway authority in the tweet to bring this situation into their notice.
In a previous work, Collins and colleagues [
24] primarily investigated users’ sentiments towards public transit system in Chicago. However, they did not attempt to extract the location information from the tweets. In this work, we go beyond the existing works by not only investigating the sentiments in a tweet but also retrieve the location contexts by combining machine learning models and knowledge-based approaches. Sentiment analysis classifies a text into negative, positive and neutral sentiment. In this research, we considered that a tweet with negative sentiment (also called negative tweet) expresses dissatisfaction of the user. When done at an aggregate level, the model will also provide information of spatial distribution of service quality. This will help transport authority to pinpoint which locations need more attention based on their frequency in negative tweets.
The primary motivation of this research is to enrich existing travel survey process by developing a UGC-based machine learning technique that can retrieve users’ perception towards transport service along with the spatial context. Although UGC is often informal, does not follow proper syntactic and lexicographical structure, and poses challenges in location retrieval, there are a number of advantages of using UGC that can add value to the existing travel survey process. The value additions are as follows.
A UGC is more free flowing. Users can express their (travel) experience, which is sometimes difficult to capture through manual surveys due to their close-ended design.
Through UGC-based approach, transport authority can monitor service quality, along with location information, either in real time or in historical manner.
In contrast to manual survey, a UGC-based approach does not need any field staff, or longer time for survey design and implementation. The UGC-based approach is more automated and can be deployed on streaming data on a specific city or multiple cities simultaneously. Due to better scalability, the UGC-based approach can save time and effort when a larger geography or multiple cities need to be monitored.
The success of manual survey is highly dependent on the ability of the surveyors to design the questionnaire to collect the most relevant information or field staff to motivate the respondents. In order to keep the field staffs focused and motivated, regular team meetings and trainings are held. This is often tedious when performed periodically. On the other hand, a UGC-based approach can be conducted continuously or in a periodical manner or around a specific time frame or an event. This will provide more insight to the authority how users perceive (transport) service quality at different time periods (different seasons), or during a new policy implementation (e.g., implementing new transport service or rise in travel fare).
Thus, UGC can be a valuable source of information which can complement manual travel survey approach. However, as UGC is informal and unstructured, we developed a machine learning-based model that can retrieve transport service quality perceived by the users along with location information that may need attention from the authority. Since Twitter has published their application programming interface (API) to retrieve tweets, in this research, we used that API to collect tweets for further analysis.
In this paper, we aimed to address the following research questions.
Can Twitter be used to understand users’ satisfaction towards public transit system in Greater Mumbai?
How can we automatically identify tweets that are relevant to public transit authority in the context of users’ satisfaction study?
How do people characterize service criteria related to mobility and infrastructure issues?
We hypothesize that, with the growing Twitter usage in Greater Mumbai, it is possible to understand users’ satisfaction and spatial distribution of the service quality by analyzing ungeotagged tweets.
The remaining paper has been organized in the following manner.
Section 2 provides state-of-the-art. We explain our model in
Section 3. In
Section 4, we discuss data preparation, experimental setups and results. In
Section 5, we address the research questions.
Section 6 contains concluding remarks, limitations, and key findings.
2. Related Work
Most of the developing cities face mobility issues with public transportation systems primarily due to budget constraints and lack of coordination and knowledge gap in users’ perceived quality of service and operators’ perception [
4,
25]. In India, the main problem of poor public transportation system is lack of financial resources and proper planning [
25]. To improve the service strategically with a restricted budget, it is important to prioritize the issues that need immediate attention. This can be done by understanding users’ perception about different attributes of service quality at different locations [
4].
Users’ perception towards service quality of transportation system reflects their satisfaction level. Users’ satisfaction can be estimated either at a global level (aggregate analysis over a public transit system) or at a specific level (individual analysis over a given service criteria) [
26]. In literature, estimating users’ satisfaction at a global level is known as global satisfaction, whereas satisfaction for a given service criteria is known as specific satisfaction [
26]. Currently, users’ satisfaction is studied through manual travel survey process where users are asked about their socio-demographic profile, commuting behavior, and their satisfaction level (on a Likert scale or alike) towards the overall transit system or some specific criteria [
3,
26,
27].
Many researchers have studied the most relevant service criteria specific to a given geography [
3]. For example, Vuchic highlighted a number of service criteria, e.g., accessibility, availability, travel time, reliability of service, comfort, safety, environmental impact [
28]. Eboli and Mazzulla conducted a study on users’ satisfaction towards public bus service in Europe based on a number of criteria, e.g., service reliability, availability, comfort, cleanliness, safety and security, environmental impact, access to trip information [
29]. Ngoc and colleagues studied the most relevant criteria that influence users’ satisfaction towards public transit system in Hanoi using factor analysis and linear regression [
3]. Ngoc and colleagues found the most relevant service criteria in Hanoi are safety and security, service coverage, and comfort [
3]. Dube conducted a manual survey to understand users’ satisfaction towards Indian railway service using a close-ended questionnaire [
30]. Three field staffs were deployed to interview 700 participants (including 100 railway officials) over 10 days. Out of all the participants, 72% were male, and 28% were female. The results showed users expressed their dissatisfaction towards cleanliness in toilets and platforms, delay of trains, and unauthorized vendors on the train. On the other hand, users expressed their satisfaction towards the waiting service, seating and water facility on the platform, pricing of railway food, and fans and lighting arrangement in the trains. While understanding the most relevant service criteria for users’ satisfaction is an active area of research [
31], the manual travel survey process which is used to collect users’ satisfaction information involves quality issues and budget constraint [
32]. For example, a manual travel survey process involves financial constraints, lack of field staffs (interviewers), long gestation periods, and quality issues (correctness and bias in the response). To address the shortcomings of manual travel survey process, there is a need to automate the process of understanding users’ satisfaction towards public transit system, especially in developing cities where financial resource is scarce and lack of communication exists between service providers and users. The automation can also complement the existing manual process.
With the emergence of ubiquitous information and communication technologies (ICT) and social media platforms (e.g., Twitter), people can share various information in a more dynamic way. Recent studies show people share their perception and feelings about different objects and their attributes in the form of user-generated contents [
33]. The feelings about any entity is called sentiment towards the given entity (or any of its attributes) [
33]. Sentiments are subjective and can be expressed in terms of positive, negative, or neutral polarity. In the context of quality assessment of a product or service, a negative sentiment means dissatisfaction towards the product or service, whereas a positive sentiment reflects satisfaction of the user. While sentiment analysis using UGC is yet to be practiced at a larger scale in users’ satisfaction study, it has been already used in other domains, e.g., in business management [
34], movie reviews [
35], socio-political study [
36], and spam detection in product reviews [
37], to name a few. In the literature, sentiment analysis has been performed primarily by two ways, e.g., unsupervised approach and supervised learning technique [
33]. In the unsupervised approach, a sentiment lexicon (also known as affective lexicon) is used to estimate the word level sentiment in the document and thereby compute the document level sentiment through an aggregator function [
38,
39]. Generally, an average or maximum sentiment value is computed [
39]. The overall sentiment type is then detected based on the aggregated sentiment value. An unsupervised approach is more suitable where the data is not annotated. Since most of the sentiment lexicons are manually crafted or developed for general applications, they show different inter-annotator agreement in a specific domain [
40]. On the other hand, a supervised approach is used when an annotated data exists to train a machine learning model. Once the model is trained with annotated sentiment data, then the model can be deployed to detect sentiment of test data [
33].
Limsopatham and colleagues used 1700 tweets to understand the temporal patterns of users’ reactions towards disruption of railway service in Glasgow [
41]. They found users mostly react at a rush hour on weekdays, whereas users react during the late evening on weekends. Congosto and colleagues used tweets generated from subway users in Madrid to detect micro events related to cleanliness and delay, to name a few, using a handcrafted transport event lexicon [
42]. Collins and colleagues explored tweets to understand users’ global satisfaction in Chicago using 557 tweets through sentiment analysis [
24]. Collins and colleagues used a word level sentiment analysis approach using a sentiment lexicon, namely SentiStrength [
39]. Collins showed the number of negative tweets are generally more than positive ones. This implies users are more likely to share negative sentiments compared to positive sentiments. Anastasia and Budi used 2500 annotated tweets to detect users’ satisfaction towards two popular transportation providers (e.g., Go-JEK and Grab) in Indonesia in terms of net sentiment score using supervised learning techniques [
43]. Jurdak and colleagues studied human mobility patterns in Australia from geotagged tweets, and they found that the majority of human mobility is centered around metropolitan cities [
44]. Zornoza and colleagues analyzed human mobility patterns in Valencia, Spain, using geotagged tweets. They developed a model to detect users’ home location [
45]. Most of the mobility-based research has used geotagged tweets, which accounts for only 0.1% to 3% of total tweet volume [
10,
46]. Thus, a lot of ungeotagged tweets containing significant information do not provide explicit location information.
To the best of our knowledge, no work has been done to understand users’ satisfaction and the location mentions with different sentiment types from ungeotagged tweets. Since, in this research, we used ungeotagged tweets, which do not have any explicit location information, we developed a model that will detect tweets that contain negative sentiments and the concerned location mentions from the tweet contents. Since tweets are informal and contain region-specific peculiarities in location mentions, we combined a supervised approach and a knowledge-based approach similar to Gelernter and colleagues [
47] and developed a hybrid georeferencing model, which is tuned to perform at Greater Mumbai.
3. Methodology
To understand users’ satisfaction from UGC, the first task is to retrieve tweets that contain users’ perception and any other information relevant to railway service. Once the relevant tweets are retrieved, a sentiment analysis is performed to extract tweets that are relevant and contain negative, positive, or neutral sentiment. A tweet containing negative sentiment can be called as negative tweet. Similarly, a tweet containing positive or neutral sentiment is called a positive or neutral tweet, respectively. To classify a relevant tweet into a specific sentiment type, a number of machine learning models are evaluated.
Once a tweet is identified containing negative sentiment, it is also important to understand the spatial context of that negative sentiment. Since, in this research, we aim to explore the potential of ungeotagged tweets to understand users’ (dis)satisfaction, we developed a novel georeferencing module to retrieve location information from the tweet content. The retrieved location information will provide spatial context of service quality in terms of users’ sentiment type. The entire workflow has been depicted in
Figure 1. The workflow can be deployed as a batch service on a cloud platform or as a standalone application. The model first retrieves a raw tweet, which is then pre-processed. Then, the model detects if a tweet is relevant or not. If the tweet is irrelevant, the model retrieves the next tweet from the repository. In case of a relevant tweet, a sentiment detection is performed followed by georeferencing. The process continues to iterate over the entire repository containing historical tweets. The process can also handle real time tweets collected in streaming mode.
3.1. Sentiment Analysis Using Supervised Learning Technique
Once the data is manually annotated by the volunteers, the annotated data is used to build a supervised machine learning model that can automatically identify the sentiment type in a text. As negative sentiments signify users’ dissatisfaction towards a service, we specifically focused on extracting negative tweets, which are critical to the transport authority.
Thus, we developed a two-stage classification models to categorize tweets using supervised machine learning approach. The first classification model distinguishes relevant tweets and irrelevant tweets. Then, a second classifier categorizes each relevant tweet into three different sentiment types (positive, negative, neutral).
Since any machine learning model cannot deal with raw text, we need to convert the text into a numerical representation that could be used by a predictive model. To do that, first, we cleaned the text to remove any white space, non-ASCII characters, and special symbols. This helps the model to not get overfitted on the training data. Following that, we removed a number of stop words, which do not bear any semantics, e.g., a, the, is, and was, to name a few. In the third step, each tweet is tokenized into a number of sentences, and each sentence is tokenized into a number of words. Since words with similar meaning can be used in different forms in a text, we used a Lovins Stemming algorithm to convert each word to its base form, thereby reducing the dimensionality of the feature space.
Following the pre-processing phase, a Bag-of-Word (BoW) approach is used to generate feature vectors for each tweet. In this case, each tweet is considered as a single document. A BoW is a classical representation of text, which is used in natural language processing (NLP) where each word is considered as a feature to be used by a machine learning model. In the BoW approach, the grammar and the order of the words in a document are not considered. Since the machine learning model requires a numerical input, the (word) features are usually converted to numerical representation. To do that, we computed a numerical weight for each word token in terms of its term frequency-inverse document frequency (
TF-
IDF) to weigh a word based on its occurrence in a document, and also in the whole corpus, as follows (Equations (1)–(3)).
where
is the total count of term ‘
t’ in document ‘
D’.
N is the total number of document in the corpus, and
is the total number of documents containing the term ‘
t’.
3.2. Affective Lexicons for Sentiment Analysis
An affective lexicon is a vocabulary of words with given sentiment types, e.g., SentiWordNet [
38], NRCEmotion Lexicon (EmoLex) [
48], and Sentistrength [
39]. In this research, we explored these three state-of-the-art lexicons to detect sentiments in tweets related to railway service in Greater Mumbai. To detect sentiment of a document, the document is first tokenized into a number of constituent word tokens. Then, each word token is looked up in a given lexicon to find its sentiment type (or sentiment strength). Then, an average or maximum sentiment strength is computed over the entire document. Sometimes, a document can also contain both positive and negative sentiment. However, in this research, we assume each tweet contains a single sentiment type. We used an aggregator function to compute an overall sentiment score for a given tweet.
3.2.1. SentiStrength
SentiStrength (SNS) is an affective lexicon which is primarily developed for detecting sentiment of a short text, e.g., tweets [
39]. SentiStrength originally consists of 298 positive and 465 negative words. The sentiment strengths are assigned on a scale of 1 to 5 or −5 to −1, respectively, where 5 and −5 indicates maximum positive and maximum negative sentiment strength, respectively. Since SNS is designed to handle typos, colloquialisms, negations, punctuation, and emoticons and their associated sentiments in a text, it eventually performs better on shorter and informal text compared to other lexicon-based approaches. For example, a word
gooood will be converted to
good by SentiStrength. If a negation word, e.g.,
shouldn’
t or
don’
t or
not, appears before a specific sentiment word, then SentiStrength inverts the sentiment type of the given sentiment word. For example, the word
good carries a positive sentiment, but, if SentiStrength encounters a negation word, for example,
not, before
good, it returns an overall negative sentiment. SentiStrength also increases sentiment strength of a word if it is preceded by a booster word, e.g.,
very, or followed by extra punctuation or repetition of same word to indicate the strong sentiment of the user. SentiStrength also supports expanding the lexicon by adding domain-specific terms with their respective sentiment strength. In this research, we aggregated the maximum positive (
V) and maximum negative (
V) sentiment value in a tweet as follows.
If V > 0, we label the tweet as positive. If V < 0, we label the tweet as negative. If V = 0, we label the tweet as neutral.
3.2.2. SentiWordNet
SentiWordNet (SWN) is based on a semantic lexicon, e.g., WordNet [
49], where terms that share similar sense are clustered in a same group known as synset. The terms in a given synset have the same parts of speech. Since a given term can be used either in a positive or negative way, each term is assigned values for both positive (
V) and negative sentiment (
V). Based on the context, the neutrality or objective value (
V) of a term can also be computed using the following formula.
Thus, the resultant sentiment value (
V) of any term (W
) in a tweet can be expressed as follows.
In this research, when using SWN, we computed the overall sentiment (
V) of a given tweet by averaging the sentiment values of all the sentiment bearing terms (W
,……,W
) in a tweet (
T) as follows.
In this research, we used SentiWordNet 3.0, which consists of total 117,660 terms with different senses. Unlike SentiStrength, SentiWordNet does not support spelling correction or negation, nor boosting sentiment strength in presence of certain booster words or punctuation.
3.2.3. EmoLex
EmoLex (NRC) is primarily an emotion lexicon which consists of 14,182 terms with approximately 25,000 senses. Each term is assigned either a positive or negative sentiment type and a specific emotion from eight pre-defined emotion types, e.g., anger, anticipation, disgust, fear, joy, sadness, surprise, trust. In this research, we used NRC to detect sentiment type of a tweet by matching each word in the tweet against the lexicon. We considered the most frequent sentiment type found in the tweet is the overall sentiment of the given tweet. Let us assume a ‘+’ symbol denotes a positive sentiment, whereas a ‘−’ denotes a negative sentiment. Now, if a tweet (T) consists of three word tokens (W) with their respective sentiment types such that T: = {W(+), W(−), W(+)}, then the overall sentiment of T is positive sentiment as the number of positive sentiment words are more than the negative ones.
3.3. Georeferencing Module
To extract location mentions in tweet content, we developed a hybrid georeferencing module which consists of two layers. The first layer (Layer 1) uses supervised model, whereas the second layer (Layer 2) uses a knowledge base. The knowledge base is developed using a number of spatial rules and local geographical aspects adapted to Indian context.
Since tweets contain different degree of informality, we used two supervised models in first layer, e.g., a linear chain Conditional Random Field (CRF) and a Maximum Entropy (MaxEnt) model. We observed tweets generated by news channels or transport authority is more formal than tweets generated by common users. To handle such diverse informality and unstructuredness in the text, we used transfer learning by using a pre-trained CRF model trained on formal texts (CoNLL-2003 data set). This pre-trained CRF model is provided in StanfordNER [
50]. On the other hand, we retrained MaxEnt model on informal tweets collected in Greater Mumbai. Thus, the two models deal with different degree of informality in the text.
To further strengthen the performance of the model, we constructed spatial rules based on spatial prepositions and a number vernacular names used in Greater Mumbai. These rules are used to develop a knowledge base in the second layer of the georeferencing module. A location entity is generally a proper noun or common noun and appears after spatial prepositions, e.g., at, near, towards, and from, to name a few.
We noticed, in Greater Mumbai, people mention place names in different ways. For example, people use different abbreviations or multiple tokens to refer a same place name, e.g., chhatrapati shivaji terminus or lokmanya chhatrapati shivaji terminus or CSMT or CST, all refer to the same place. We also observed, in Greater Mumbai, while mentioning place names, people use lots of vernacular names, for example, Tilak nagar, Raj and bhavan, to name a few. Generally, such vernacular names occur after a proper noun. Most of these vernacular names refer to some local geographical objects. For example, nagar in Hindi means suburb or city in English.
The parts of speech (POS) of these vernacular names are generally proper noun or common noun when detected by a pre-trained POS tagger. However, due to the peculiarities of these vernacular place names, sometimes they are not detected as proper noun or common noun. To make sure the georeferencing model finds the legitimate place names that end with vernacular names, a lexicon is developed that contains potential vernacular names used in Greater Mumbai. This helps to identify if a word is a potential place name based on the POS tag and spatial rules.
To retrieve the place names, each tweet is first fed into Layer 1, which extracts location mentions using CRF and MaxEnt models. Then, the tweet is fed into Layer 2, which consists of spatial rules. Based on the rules, the place names are further extracted from Layer 2. Then, a duplication check is performed to detect unique place names retrieved from both layers. Following that, the place names are geocoded using OpenStreetMap (OSM) Nominatim service (
https://nominatim.openstreetmap.org/, accessed on 7 March 2021).
5. Discussion
In most of the developing cities, there is a gap between users’ perceived quality of mobility service and service providers’ perception towards the mobility service. The level of service (LOS) can be viewed either from a public transit operator or from a user’s perspective. While users create the demand, operators provide the supply. This research bridges the gap between such supply and demand by analyzing how users perceive the LOS provided by the railway operators in Greater Mumbai. It is often difficult for the authority to realize users’ needs and deterioration of certain service criteria at a given place. This knowledge gap leads to quality degradation and loss of patronage. To bridge this gap, in this research, we explored if Twitter, a social-media platform, can be used to complement current (manual) survey practice for understanding users’ satisfaction and performance measures. Our study shows Twitter can be used to understand users’ satisfaction towards public transit system. In this research, we developed a framework that consists of a number of modules. The first module collects Twitter data followed by detecting relevant tweets and sentiment analysis. Most of the tweets are ungeotagged [
10] without any explicit location information, which requires a georeferencing module to extract location information. So, we used all ungeotagged tweets in our study. We observed an RF classifier outperforms other models in terms of detecting negative tweets (0.95 recall) (
Table 2). When tested on knowledge driven models, among three lexicon-based models, SentiStrength (SNS2) works best to detect (negative) sentiments from the tweets with Kappa = 0.31 (
Table 5).
In this paper, we developed a novel approach to retrieve transport service quality information using Twitter-based user-generated content. The presented model can complement existing manual travel survey process in a more adaptive and scalable way. Since the proposed approach does not need a field staff or a surveyor, it can be conducted any time, on streaming data or on historical data. This approach can be used over a multiple cities simultaneously or at different temporal interval or to understand the impact of any policy change on transport service quality.
In 2018, Ola Mobility Institute conducted a survey on 43,000 participants across 20 different cities in India. According to that survey, half of the commuters in Mumbai use public transport system to save time or money. However, when measured based on Ease of Moving Index, given a number of other commuting options, only 12% of the commuters preferred public transport system in the first place [
54]. In 2012, Dube conducted a manual passenger satisfaction survey on Indian railway in northern part of India and showed users often express their dissatisfaction towards cleanliness in toilets and platforms, delay of train service, and unauthorized vendors on the train [
30]. Our findings align with their work especially on cleanliness and delay. This suggests these problems are still prevalent and common in many different Indian cities. This requires an attention from the authority to address these issues. We found people generally tweet more negative sentiments than positive ones, which also conforms earlier study by Reference [
24]. This indicates that Twitter can be used as a source of understanding users’ (dis)satisfaction.
Figure 4 shows users mostly express their negative sentiments at the peak hours in the morning and evening. In this research, we assume a tweet which contains any complaint with some words related to malfunctioning of railway infrastructure or mobility service expresses user’s dissatisfaction, thus carrying negative connotation. For example,
wasting water,
smell from the toilet,
lack of sitting facility, and
malfunctioning of the fans has been used by the users to indicate their lack of satisfaction with the railway service. Our study shows that users generally provide spatial context while mentioning any issue. They also mention railway ministers, or other key administrative personnel, in their tweets to bring the problem into their notice. The georeferencing module shows users are concerned with various infrastructure and mobility related issues at
Andheri,
Bandra,
Borivali,
CST, and
Dadar.
In this paper, it is assumed that, if a tweet contains a negative sentiment related to transport service, then that is of interest to the authority, irrespective of the user characteristics, as it indicates a quality issue in the transport service. Although we did not study user bias in this research, with more data, the model can be strengthen to infer users’ reaction in a more comprehensive way. Understanding the bias may infer the effect of different socio-demographic and economic segment on public transport usage and their concerns towards various service aspects. The bias may also refer to over-estimation or under-estimation of satisfaction level about certain service criteria by a specific socio-demographic group.
This research conforms to some of the earlier findings of References [
24,
41]. For example, people generally share more negative tweets than positive tweets related to mobility services. People are generally active in tweeting during the peak hours. However, previous works [
24] used only a lexicon-based approach without any validation. In our research, we compared the accuracy of different lexicon-based and supervised machine learning models, while detecting users’ (dis)satisfaction. Previous works [
3,
24,
41] did not use ungeotagged tweet; thus, they did not address the spatial context of negative sentiments. In this paper, we developed a novel georeferencing module that can retrieve locations from informal tweets in Greater Mumbai and can geocode them on a map which can further help in informed decision-making and policy implementation.
6. Conclusions
In this paper, we developed a framework that can understand users’ (dis)satisfaction towards public transit system, in particular Mumbai railway service from ungeotagged tweets. Understanding users’ (dis)satisfaction can help public transit authority to prioritize different service criteria or locations that require immediate improvements and thus increase patronage on public transport modes. Currently, users’ satisfaction information is collected through manual travel survey process or through indirect means, which often involves high investment, longer gestation periods, and quality issues. In this research, we developed a novel framework that can leverage UGC harvested from online social-media platform(s) and can infer users’ (dis)satisfaction through sentiment analysis.
Since most of the tweets do not have any explicit geolocation, we explored the potential of ungeotagged tweets to understand users’ satisfaction. We compared the performance of supervised machine learning models to understand users’ sentiments. If the aim is to detect all the negative tweets to retrieve transport service related issues as much as possible, an RF-based model should be used, which provides 0.95 recall (
Table 2).
In terms of knowledge-driven techniques, an updated SentiStrength-based lexicon performs best compared to other two lexicons (SentiWordNet, EmoLex). Due to inherent nature of learning the pattern from the data, supervised models work better than lexicon-based models. However, when there is scarcity of annotated data, a lexicon-based approach can be used. Other advanced machine learning techniques based on word-embeddings, for example, Transformer, can also be evaluated in future to understand people’s perception towards transportation service given the larger data set. Since the tweets that we used in this study are ungeotagged, we developed a novel georeferencing model that can retrieve location mentions in the text which can be used as a spatial cue for the negative sentiments towards railway service in Mumbai.
Based on the research, we present some key points, which are critical to the public transport system, particularly the railway service in Greater Mumbai.
Ungeotagged tweets can be used to understand people’s perception about railway service quality. A negative tweet indicates the quality issue with the transport service.
An RF-based model outperforms other models to detect negative sentiments in the transportation related tweets with 0.95 recall accuracy.
People tend to tweet more negative sentiments compared to positive ones and express their (dis)satisfaction towards railway service in Greater Mumbai. That said, negative sentiments are more critical to understand transport service quality.
We observed that people often express their dissatisfaction in Andheri, Bandra, Borivali, and Mira-Bhayandar railway station along Western railway line. On the other hand, most of the dissatisfied tweets are reported in Thane and Mulund railway station along the Central line.
Most of the complaints are related to infrastructure and mobility issues in Greater Mumbai.
We used comparatively a smaller data set. Based on our findings, most of the transport related tweets are reported at 10 a.m. and at 4 p.m. in Greater Mumbai.
Although, in this research, the user bias is not investigated, further research should be undertaken to study what kind of user bias exists towards understanding public transport quality in Greater Mumbai. It is also important to understand how the bias affects overall findings at different granularity, and in terms of different socio-demographic aspects.
6.1. Limitations and Future Outlook
Despite the model proving its efficacy in understanding users’ (dis)satisfaction towards Mumbai railway system, there are some limitations in this research. First of all, the data set used in this research is collected over a shorter duration. A future study should look into a longer duration, including seasonal variation and impact of various events (e.g., political rally, accidents, change in rail fare, change in petrol price, etc.) on the usage of public transit system. The data set may have user bias as most of the social-media users are young generation with an uneven gender distribution. A further analysis can be carried out in that direction.
Although, in this study, we detected users’ sentiments and the location associated with it, there are a number of other service quality parameter to be considered [
3,
26]. One of the limitations of the proposed approach is that it is difficult to understand if user is rider or non-rider in contrast to manual survey approach. This could potentially create a user bias in the data. To distinguish a user as a rider or non-rider, future research can extract user mobility patterns from the tweets.
From the perspective of text analysis, a classical BoW model is used without considering any context and order of the words. This may affect the classification performance. A future study should compare a BoW model with more sophisticated word embedding models (e.g., Word2Vec or Glove) that capture context and word semantics. Sometimes, users can also tweet in a sarcastic manner, for example, The service is "great"!, which could actually mean a negative sentiment. Users can also attach emojis while tweeting to express their feelings. In this study, we did not perform any sarcasm detection or emoji analysis. This may introduce some bias in the prediction. A follow up study should address these limitations.
6.2. Recommendations
The model presented in this research can complement existing manual survey approach. Using the proposed model the users’ perceived quality information can help the authority to better manage their infrastructure and supply. The majority of the negative tweets are associated to Western and Central railway lines, especially at Andheri, Bandra, and Bhayandar. Authority should pay more attention to improve timeliness, cleanliness, and safety aspects at these locations.
Based on the frequency of (dis)satisfaction level at a given location, alternative services, e.g., para-transit (auto-ricksaw) or shared-ride, can also be deployed as a gap filler, especially in the locations where there is a frequent delay of connecting services. Based on the frequency of safety-related tweets, police and safety departments can also take proper measures. The model developed in this research is scalable and adaptive to similar UGC, e.g., Facebook posts. This will also support urban planners and other public departments in various policy implementation related to revising the transport fare, allocating budget for new railway services, and estimating travel demand, to name a few.
Although the model has been developed and tested for railway service in Greater Mumbai, the same approach can be used for other transport mode(s) in other cities. Thus, in this research, we demonstrated that Twitter can provide fine grained information on users’ travel experience in a more adaptive manner. The georeferencing module further demonstrates the potential of ungeotagged tweets for understanding mobility issues at various locations. The model presented in this paper can complement existing transport infrastructure to bridge the gap between supply and demand. The insights retrieved by the model can improve the existing service quality.