Next Article in Journal
Observational Monitoring Records Downstream Impacts of Beaver Dams on Water Quality and Quantity in Temperate Mixed-Land-Use Watersheds
Previous Article in Journal
Predictors of Immune Fitness and the Alcohol Hangover: Survey Data from UK and Irish Adults
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sentiment Matters for Cryptocurrencies: Evidence from Tweets

Department of International Business and Economics, The Bucharest University of Economic Studies, 010374 București, Romania
*
Author to whom correspondence should be addressed.
Data 2025, 10(4), 50; https://doi.org/10.3390/data10040050
Submission received: 5 February 2025 / Revised: 29 March 2025 / Accepted: 30 March 2025 / Published: 1 April 2025

Abstract

:
This study provides empirical evidence that cryptocurrency market movements are influenced by sentiment extracted from social media. Using a high frequency dataset covering four major cryptocurrencies (Bitcoin, Ether, Litecoin, and Ripple) from October 2017 to September 2021, we apply state-of-the-art natural language processing techniques on tweets from influential Twitter accounts. We classify sentiment into positive, negative, and neutral categories and analyze its effects on log returns, liquidity, and price jumps by examining market reactions around tweet occurrences. Our findings show that tweets significantly impact trading volume and liquidity: neutral sentiment tweets enhance liquidity consistently, negative sentiments prompt immediate volatility spikes, and positive sentiments exert a delayed yet lasting influence on the market. This highlights the critical role of social media sentiment in influencing intraday market dynamics and extends the research on sentiment-driven market efficiency.
MSC:
91G80; 91B84; 68T50; 91G70

1. Introduction

Recent studies have abounded on the connections between sentiment extracted from social media and financial markets (e.g., [1,2]). The Efficient Market Hypothesis (EMH) states that asset prices fully reflect available information, yet sentiment can cause short-term deviations, especially in markets with high retail investor activity such as cryptocurrencies. Liquidity plays a key role in absorbing sentiment shocks, with studies (e.g., [3,4]) showing that higher liquidity dampens price distortions.
The role of sentiment in cryptocurrency markets has been widely examined in recent research. One study [5] provides evidence that cryptocurrency prices react significantly to issuer sentiment expressed on Twitter, with trading volume increasing in response to strong sentiment signals. Their findings support the hypothesis that social media plays a crucial role in shaping market expectations and investor behavior, aligning with our motivation to investigate sentiment-driven market reactions. Similarly, ref. [6] highlights that sentiment extracted from news media influences volatility spillovers among cryptocurrencies, particularly for assets with strong community engagement. These studies emphasize the importance of sentiment as a key driver of market fluctuations, reinforcing our motivation to analyze its impact on cryptocurrency market dynamics.
This paper employs intraday data on cryptocurrencies and tweets issued by nine relevant entities to study the connections of sentiment indicators and characteristics of log returns, their jumps, and liquidity. Sentiment indicators are extracted using state-of-the-art natural language processing models, with tweets serving as reference points for analyzing crypto market dynamics.
We set forth the hypothesis that connections between social media and crypto market activity have a bilateral nature. As such, posts on Twitter are triggered by market dynamics, and their very presence stimulates successive market action. We therefore investigate both directions of this relationship.
Our work contributes to the existing literature in several respects. Firstly, to the best of our knowledge, papers that analyze the impact of social media on financial markets either avoid the estimation of sentiment and focus only on the frequency or intensity of social media issues [7,8] or use various proxies for sentiment computation [5,6]. In contrast, our paper evaluates the efficiency of five different sentiment analysis models, ensuring a robust classification of market-relevant sentiment. By comparing multiple models, we reduce the risk of methodological bias and provide a stronger foundation for estimating the impact of sentiment on market dynamics. This multi-model approach significantly enhances sentiment estimation, offering a more accurate and comprehensive perspective than traditional single-model methods.
Secondly, our methodology for tweet selection improves upon prior approaches by broadening the dataset while preserving relevance. Previous studies either rely on a limited sample, such as tweets from a single influencer (as in [9]), or indiscriminately analyze all cryptocurrency-related tweets (as in [8]), which can introduce noise and reduce the reliability of sentiment signals. Our approach refines tweet selection by focusing on accounts with high market influence and using vocabulary-matching rules to filter out irrelevant data. This allows us to isolate sentiment that is more likely to drive market behavior, rather than general discussion noise.
Thirdly, in contrast to many prior studies that analyze only one cryptocurrency asset, we examine intraday data covering four cryptocurrencies, selected based on liquidity, over a four-year period. This broader approach allows us to investigate both market-wide sentiment effects and asset-specific characteristics. By employing high frequency five-minute data, we capture rapid market reactions to sentiment shifts, an aspect often overlooked in studies using daily or lower-frequency datasets. This enables us to test whether sentiment-driven effects persist across different cryptocurrencies and to identify patterns that may arise due to the underlying technological and economic differences among these assets.

2. Materials and Methods

Our five-minute frequency data covers the period from October 2017 to September 2021 and the most liquid cryptocurrencies, namely Bitcoin, Ether, Litecoin, and Ripple, totaling 1,682,804 observations. These cryptocurrency data were extracted using the CoinAPI platform, which provides historical price and market data via its REST API. The extraction process utilized specific endpoints to retrieve open-high-low-close volume (OHLCV) data for the cryptocurrencies being analyzed.
To develop the sentiment analysis, we created a pool of tweets. The tweets were collected using the Python library snscrape (The process involves defining search criteria (e.g., username, date range), iterating through retrieved tweets, and storing relevant details (date, tweet ID, content). The key steps can be summarized as follows: (1) create an empty list for storing tweets, (2) use TwitterSearchScraper with specific parameters, (3) loop through tweets, extracting relevant data, and (4) append results to the list.). This approach enabled the extraction of historical tweets based on specific queries for each cryptocurrency and the analyzed period.
Our ad-hoc criteria for selecting tweets are twofold: relevant sources and relevant tweets. For the former, we chose three categories of Twitter accounts, respectively, “crypto influencers” (This terminology is used on forums to refer to persons that have the capacity to influence the cryptocurrency markets.), general news publications, and crypto-specific publications. For each such category, we picked the three most representative accounts ranked by number of followers, opinions on forums and specialized sites (displayed in Table A1 of Appendix A), and number of crypto-related tweets posted in the analyzed interval. The characteristics of the nine accounts are shown in Table 1.
Sources were selected based on their influence, determined by the number of followers, activity levels, and relevance to cryptocurrency discussions. For example, @elonmusk, a well-known influencer, was included due to his significant impact on crypto market movements. Similarly, @Reuters was selected as a reputable source of general news publications.
For relevant tweets, we extracted only those that contain the following words: “BTC”, “Bitcoin”, “ETH”, “Ether”, “Ethereum”, “LTC”, “Litecoin”, “XRP”, “Ripple”, “crypto”, “cryptocurrency”, “cryptocurrencies”, “altcoin”, “altcoins”. Our resulting sample across all categories consists of 74,078 unique observations (tweets), for which we recorded the time (at the minute level) and the containing text.
It is important to note that some tweets appeared in multiple cryptocurrency categories due to overlapping keywords. General terms such as ‘crypto’ and ‘cryptocurrency’ were included for all cryptocurrencies, ensuring comprehensive coverage. As a result, the aggregated totals for tweets across individual cryptocurrencies exceeded the unique dataset size of 74,078. For example, the dataset contained 58,436 tweets for BTC, 37,826 tweets for ETH, 32,331 tweets for LTC, and 33,421 tweets for XRP. These totals include contributions from crypto influencers, crypto-specific publications, and general news publications.
Specifically, BTC had 7716 tweets from crypto influencers, 45,774 from crypto-specific publications, and 946 from general news publications. ETH included 2442 tweets from influencers, 32,655 from crypto-specific publications, and 2729 from general news publications. Similarly, LTC consisted of 2189 tweets from influencers, 27,494 from crypto-specific publications, and 2648 from general news publications. Lastly, XRP had 2251 tweets from influencers, 28,381 from crypto-specific publications, and 2789 from general news publications.
Outliers, such as irrelevant or noisy tweets, were not removed at this stage to retain the integrity of the dataset. These were addressed later in the data cleaning process, as detailed in Section 2.1.

2.1. Sentiment Indicators

The data cleaning process involved several steps to standardize the tweets and remove irrelevant or noisy elements. First, Twitter mentions (e.g., @username), URLs, special characters, and excessive whitespace were removed to focus solely on the textual content. Emojis were also stripped due to their highly variable interpretation, which could introduce ambiguity in sentiment classification, ensuring that sentiment signals are derived from explicit textual content. Non-English words were filtered, to ensure that only recognized words contributed to the sentiment analysis. Finally, any tweets left empty or containing no meaningful content after cleaning were removed from the dataset. This preprocessing ensured a cleaner, standardized dataset better suited for accurate sentiment analysis. We performed no additional alterations of this data; thus, the impact on the sentiment scores was minimal.
To extract sentiment from tweets, we employ a combination of deep learning and lexicon-based models to ensure a comprehensive classification. Specifically, we use five models: Bidirectional Encoder Representations from Transformers (BERT), a model developed by [10]; FINBERT, created by [11] as an adaptation to the financial domain for the BERT model; RoBERTa, a model developed by [12] also as a derivation of the BERT model that measures the impact of many key hyperparameters and training data size; Loughran & McDonald Master Dictionary, which was first presented in [13]; and VADER, which is a sentiment analysis tool specifically designed for social media, developed by [14]. These models were implemented using the transformers library in Python (version 3.11.7), along with standard lexicon-based sentiment scoring techniques.
The selection of models was based on their complementary functioning, which facilitates the accurate identification of sentiment extracted from social media posts. BERT, FINBERT, and RoBERTa, as deep learning models, excel in capturing context, subtle nuances, and complex structures such as sarcasm and negation, making them highly effective for analyzing sentiment in social media. However, they are computationally intensive and may inherit biases from training data. In contrast, Loughran & McDonald and VADER, as lexicon-based models, are efficient for detecting polarity in financial texts and clear expressions, although they are less effective in handling nuanced or ambiguous language.
Combining these models leverages the strengths of both approaches: deep learning models interpret complex sentiment cues, while lexicon-based models ensure straightforward and stable classification. This hybrid approach minimizes individual model limitations, enabling a more robust and comprehensive sentiment analysis. Compared to classical machine learning techniques, such as Naïve Bayes and Support Vector Machines (SVM), which rely primarily on word frequency, the models used in this study provide a more nuanced understanding of sentiment by capturing contextual meaning. Prior studies, such as [15], have demonstrated that deep learning models, particularly LSTM architectures, outperform traditional machine learning techniques in classifying sentiment within cryptocurrency-related tweets. Their findings reinforce our choice of deep learning-based approaches for extracting meaningful sentiment signals from social media data.
These five models produce different outputs. Training a BERT model results in the assignment of either the positive or the negative label to a tweet, while FINBERT and RoBERTa also allow for the tweet to have a negative sentiment. Based on the probability for a tweet to be part of each of these categories, these models deliver scores that quantify the intensity with which a certain tweet belongs to a certain category (positive, negative, or neutral).
From a different perspective, Loughran & McDonald Master Dictionary searches in each tweet for the words that belong to the positive class and words that belong to the negative class. It outputs scores computed as proportions of the number of words matched from each class to the total number of words in the tweet. Therefore, it flags a tweet as positive if the proportion of positive words is larger than the proportion of negative words. The VADER model uses even another methodology and delivers probabilities for a tweet to be positive, negative, or neutral.
Given these various types of outputs, we designed a standardized approach with the purpose of measuring the extent to which these models achieve consensus for the classification of each tweet. We therefore adapted the outputs of the Loughran & McDonald Master Dictionary and VADER models to resemble the ones of BERT, FINBERT, and RoBERTa.
To this end, in the case of the Loughran & McDonald Master Dictionary, we retained the proportions of positive and negative words for each tweet. As such, for example, if the number of positive and negative words was equal, the sentiment using the Loughran & McDonald Master Dictionary would be neutral. For VADER, we consider a tweet to be positive, negative, or neutral according to the highest score generated by the model.
Using this standardized framework, we designed two methods to calculate the sentiment of each tweet. The first approach (method M1) consisted of calculating a weighted arithmetic mean of the sentiment indicator sign (which was 1 for positive, 0 for neutral, and −1 for negative) with the associated score computed using all five models described above. In order to be able to classify, we divided the interval of these weighted averages, which was [−1; 1], into three sub-intervals, i.e., [−1; −0.33), [−0.33; 0.33], and (0, 33; 1]. As such, values situated in the first sub-interval indicated a negative sentiment, the ones in the second sub-interval indicated a neutral sentiment, and the ones in the last sub-interval indicated a positive sentiment.
As far as the second method is concerned (M2), for each tweet, we used the mode to retrieve the most frequently occurring category (positive, negative, or neutral) across all models.
These two methods were chosen for their ability to complement one another. Method M1 captures subtle sentiment variations by weighting scores based on the confidence of each model, while Method M2 ensures robustness by selecting the most frequent sentiment category, reducing the impact of outliers. Although ensemble methods or a single optimized model could also be viable, we opted against them to maintain a straightforward and interpretable framework, avoiding over-reliance on any single model. By combining M1 and M2, we aimed to achieve a balance between detail and stability, tailored to the diverse outputs of the five models.
While the selected models offer complementary strengths, each comes with inherent biases, especially in the noisy and subjective context of social media. Deep learning models, such as BERT, FINBERT, and RoBERTa can misinterpret less common expressions or ambiguous contexts due to biases in their training data. To counterbalance this, we combined them with lexicon-based models such as Loughran & McDonald and VADER, which excel in detecting straightforward polarity in financial texts but struggle with sarcasm, negation, or evolving language. By standardizing outputs across models and using the complementary methods M1 and M2, we aimed to minimize the limitations of individual models and ensure a balanced, robust sentiment analysis.
The next step in our analysis dealt with the matching of tweets and their sentiment gauges with the market dynamics. We used the timestamps of tweets to position them in the five-minute intervals for each cryptocurrency in our sample (Bitcoin, Ether, Litecoin, and Ripple). In cases where we found two or more tweets posted in a five-minute interval, we established, for each of the five models, a unique sentiment class and an associated score.
For the first method, we used the scores of the two (or more) tweets to compute an average and then classified the tweets from this interval as positive, negative, or neutral according to their position in one of the three intervals ([−1; −0.33), [−0.33; 0.33], and (0.33; 1]).
We also used the second method, where we classified the tweets from the respective interval based on the mode, that is, we selected the sentiment category with the highest frequency among the tweets. Table A2 (Appendix A) presents examples illustrating the sentiment estimation process.

2.2. Measuring Jumps and Liquidity

To reveal the impact of tweets on cryptocurrencies, we computed jumps and liquidity. Jumps were detected by applying the methodology put forward by [16], which relies on the estimation of local volatility, which is notoriously shifting for cryptocurrencies. We calculated:
L i = l o g ( S ( T i ) / S ( T i 1 ) ) σ ( T i )
σ T i = 1 k 2 j = i k + 2 i ( log S T j log ( S T j 1 ) ) 2
where S(Ti) is the return in the time interval i, T is the time interval, n is the number of observations in [0, T], and k is the numerical indicator that represents the time interval between observations. For five-minute data, in [16], the authors suggest k = 10.
Using the properties of the Gumbel distribution, we can reject the null hypothesis that there is no jump if:
L i C n S n > 4.60001
where   Cn = 2 l o g ( n ) c log π + log ( n ) 2 c 2 l o g ( n ) ;   Sn = 1 c 2 l o g ( n )   and   c     0.7979
The Gumbel distribution is commonly used for detecting extreme price movements, but as with any statistical approach, it may lead to classification errors. Specifically, some normal volatility spikes might be misclassified as jumps (false positives), while certain genuine jumps might not be detected (false negatives). However, given the high frequency of cryptocurrency price changes, our methodology ensures a reliable identification of significant jumps.
Our measure of liquidity is rooted in the illiquidity (ILLIQ) measure of Amihud [17]. We chose this measure based on the recommendations of [18] regarding the use of Amihud’s ILLIQ gauge for measuring the 5 min price impact of factors.
ILLIQ   ( Ti ) = | S T i | V o l ( T i )
where Vol(Ti) represents the trading volume in time interval i.

2.3. Construction of Variables and Definition of Impact Window

To summarize, after matching tweets with time intervals and market features, for each five-minute interval we retained the following variables—trading volume: Vol(Ti); number of transactions: Trades (Ti); log-return: S(Ti); dummy variable for jumps: J(Ti) = (1, if jump; 0, otherwise); dimension of jump (also referred to as “jump intensity”): Li; Amihud illiquidity measure: ILLIQ (Ti); tweet-event dummy variable: Dev (Ti) = (0, before Ti; 1, otherwise); sentiment dummy variable: Dsent (Ti) = (1, if sentiment is Sent; 0, otherwise); Sent can be positive, negative, or neutral.

3. Results

We analyzed two types of situations (depicted in Figure 1): firstly, we created measures for the one-hour intervals before and after each five-minute time interval when Twitter posts took place, and secondly, we analyzed all cases of 125 min time frames surrounding five-minute intervals where no tweets were posted. Comparisons of these two situations allowed us to extract information about the impact of the presence of tweets.
The indicators for which we computed averages for the three cases (before and after in the presence of tweets and across all 125 min for situations with no tweets) were the trading volume, the number of transactions, the log returns, the Amihud ILLIQ indicator, and the jumps dimension. For each case, we also computed the standard deviation for log returns and jump dimensions.
Table 2 highlights key differences between periods with and without tweets. Across all cryptocurrencies, log returns exhibit higher volatility in tweet-active periods, indicating greater price fluctuations. Trading volume also increases significantly when tweets are present, suggesting heightened market participation. Additionally, liquidity improves, as reflected by a lower Amihud ILLIQ measure, particularly for certain assets, reinforcing the role of social media in influencing market efficiency. These results confirm the previous findings of [4]. They are also valid regardless of the type of sentiment extracted from tweets.
We performed a replication of this analysis using the other variables mentioned in Section 2.3. Results on the reaction of log returns, dimensions of jumps, and volatility of jumps are presented in Table A3 (Appendix A). We notice that jumps are generally negative on average, with more positive cases usually after tweets (negative for XRP and positive for BTC).
Using simple and logistic regressions, we studied the dependence between the presence of tweets marked as positive, negative, and neutral, on the one hand and the trading volume (Vol(Ti)), the number of transactions (Trades(Ti)), the log returns (S(Ti)), the Amihud ILLIQ (Equation (5)), and the presence of jumps (J(Ti)) on the other hand. Table A4 (Appendix A) demonstrates that neutral sentiment has the strongest and most persistent impact on liquidity across all analyzed cryptocurrencies. Unlike positive and negative sentiment, which show no significant immediate effects, their influence emerges at later time lags, suggesting that sentiment polarity is shaped by prior market conditions rather than acting as an immediate driver of price changes. Regression results indicate that the effects of tweet occurrences on trading volume and liquidity remain statistically significant over multiple five-minute intervals, with p-values consistently close to zero across all tested models. This persistence suggests that sentiment effects are not instantaneous but rather unfold progressively, influencing market behavior beyond the initial reaction.
The regressions performed, as shown in Table A5 (Appendix A), indicate no causal relationships between log returns and sentiment, regardless of the direction of dependence. Neither sentiment as an explanatory variable for log returns nor log returns as a determinant of sentiment yields statistically significant results, suggesting that market returns and sentiment dynamics are largely independent.

4. Discussion

The results presented in this paper underscore the complex and reciprocal nature of the relationship between social media-extracted sentiment and cryptocurrency market dynamics. Notably, the results indicate that periods with tweets, regardless of sentiment polarity, have greater liquidity and volatility of log returns. This observation strengthens the support for the hypothesis that social media activities can amplify market movements, consistent with prior findings such as the ones in [4].
While our results indicate a clear relationship between social media sentiment and market dynamics, alternative explanations should be considered. For instance, the observed correlations could be driven by external macroeconomic factors that simultaneously influence social media activity and market behavior. Moreover, sentiment scores may not fully capture the nuanced impact of specific events or news that shape market expectations.
The results of the regressions show that neutral sentiment tweets have notable and lasting effects on market liquidity for all the analyzed cryptocurrencies. This indicates that the market might perceive neutral tweets as signals of stability or as subliminal messages within the context of the social media stratosphere.
The evidence of lagged effects illustrates that sentiment polarity traces delayed impacts across market variables, underscoring the complexity of this interaction. Specifically, although positive and negative sentiments do not affect liquidity instantly, the effects become significant in the next intervals. Such a pattern would suggest that market participants need time to digest the content of tweets, correlating it with other market signals.
Importantly, the study also uncovers an interesting asymmetry in the way markets respond: neutral and negative tweets appear to correlate more closely with contemporaneous jumps, while positive tweets exhibit more lagged yet longer-lasting associations with market behavior. This asymmetry in response to these tweets could be due to the behavioral biases of investors, who tend to react more strongly to bad news, or from the informational content of bad news being incorporated into prices.
One important aspect to consider is that our dataset spans from October 2017 to September 2021, covering the COVID-19 pandemic, a period of heightened volatility and investor uncertainty. Previous research (such as [19]) has shown that the pandemic significantly impacted cryptocurrency markets, making it a relevant factor to explore. However, our study focuses on sentiment-driven market reactions in a high frequency setting, where the primary objective is to capture immediate price adjustments following sentiment shifts, regardless of broader macroeconomic events. Separating pre-pandemic and pandemic sentiment effects would require a distinct methodological approach that accounts for structural breaks in market behavior. While this is beyond the scope of our study, future research could examine whether sentiment dynamics operate differently in crisis versus non-crisis periods by integrating a regime-switching model or explicit event-based segmentation.
We acknowledge that cryptocurrencies, while exhibiting unique behaviors, are still influenced by broader economic conditions. Factors such as interest rates, inflation data, and major financial events may contribute to price fluctuations, potentially affecting both market sentiment and trading activity. Moreover, discussions on social media about cryptocurrencies may themselves be driven by economic news, making it challenging to fully separate sentiment-driven effects from broader macroeconomic influences.
Our study deliberately isolates the role of sentiment in cryptocurrency market movements, focusing on short-term reactions rather than long-term macroeconomic trends. Previous studies (e.g., [20]) have already examined the impact of macroeconomic indicators on cryptocurrency pricing, whereas our work contributes to the growing body of literature that highlights sentiment as a key driver of short-term liquidity and volatility.
Future research could explore additional sentiment sources beyond Twitter, such as Reddit or financial news, to assess whether sentiment effects differ across platforms. Furthermore, hybrid models that integrate both sentiment and macroeconomic data could help disentangle their respective influences on cryptocurrency markets. A potential approach would involve distinguishing between sentiment derived from independent discussions and sentiment reacting to financial news, thereby refining the understanding of sentiment-driven market behavior.
Moreover, despite its contributions, this study has certain limitations. First, the dataset is limited to tweets in English, potentially excluding valuable sentiment data from non-English-speaking regions. Second, the models used for sentiment analysis, while robust, may not fully capture the nuanced language of social media, such as sarcasm or context-specific emojis. Future research could address these limitations by incorporating multilingual datasets or exploring advanced sentiment models.

5. Conclusions

We present evidence that sentiment indicators calculated from relevant social media posts, using current state-of-the-art machine learning tools from the field of natural language processing, are connected to market dynamics. Our analysis allows us to identify the market features that relate to these posts and to quantify their impact.
Tweets appear to both coincide with market activity and be followed by changes in market dynamics. As such, we allow for the existence of bidirectional causality between sentiment indicators and the presence of tweets on one hand and several features of market dynamics on the other hand.
The results indicate that tweets are associated with an increase in liquidity and volatility of cryptocurrencies’ log returns, with effects that vary depending on their polarity. We also found some evidence of a relationship between log returns, jump activity, and the sentiment of tweets in the vicinity of their time frame, which weakens as we extend the time horizon.

Author Contributions

All activities related to this manuscript, including conceptualization, methodology, investigation, formal analysis, writing—original draft preparation, and writing—review and editing, were carried out jointly by R.L. and P.C.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Specialized websites used for selection of the most representative accounts ranked by number of followers.
Table A2. Sentiment estimation process.
Table A2. Sentiment estimation process.
Tweet ID12345
Keyword“Bitcoin”“Ripple”“Litecoin”“altcoin”“Bitcoin”
AccountBloomberg CryptoFinancial TimesCoinDeskCoinTelegraphMichael Saylor
Timestamp2018-02-02 13:17:172017-10-10 13:34:592021-09-13 19:35:182021-09-30 22:37:012021-09-21 13:10:37
RoBERTaSentimentNEUTRALNEUTRALNEUTRALNEGATIVEPOSITIVE
Sign000−11
Coef.0.8260.9010.8810.7870.980
BERTSentimentNEGATIVENEGATIVENEGATIVENEGATIVEPOSITIVE
Sign−1−1−1−11
Coef.0.8320.9880.9980.9990.999
FINBERTSentimentNEUTRALNEUTRALNEGATIVENEGATIVENEUTRAL
Sign00−1−10
Coef.0.9150.8130.9220.7100.839
VaderSentimentNEUTRALNEUTRALNEUTRALNEGATIVEPOSITIVE
Sign000−11
Coef.110.9220.6670.722
L&MSentimentNEUTRALNEUTRALNEUTRALNEGATIVEPOSITIVE
Sign000−11
Coef.11111
SENTIMENT RESULTS
M1SentimentNEUTRALNEUTRALNEGATIVENEGATIVEPOSITIVE
M2SentimentNEUTRALNEUTRALNEUTRALNEGATIVEPOSITIVE
Table A3. Returns and jumps dimension during periods with/without tweets.
Table A3. Returns and jumps dimension during periods with/without tweets.
Tweets/No Tweets PeriodsLog Returns-MeanJumps Dimension (Li)-MeanJumps Dimension-Std.Dev
1 h Before Tweets/First 60 min Without Tweets1 h After Tweets/Last 60 min Without Tweets1 h Before Tweets/First 60 min Without Tweets1 h After Tweets/Last 60 min Without Tweets1 h Before Tweets/First 60 min Without Tweets1 h After Tweets/Last 60 min Without Tweets
M1M2M1M2M1M2M1M2M1M2M1M2
BTCAll1.62 × 10−51.69 × 10−5−0.353958848−0.0102224821.8073754118.01872714
Positive1.82 × 10−52.77 × 10−52.58 × 10−51.8 × 10−5−0.086050.5539810.1972840.26811610.101028.6797638.3807718.579001
Neutral1.69 × 10−51.42 × 10−51.71 × 10−52.02 × 10−5−0.25753−0.380860.053163-0.0812421.6153220.9408320.8876321.28642
Negative1.21 × 10−51.71 × 10−59.72 × 10−63.6 × 10−6−0.92607−1.01116-0.3953-0.1557627.9003932.430579.4606759.224091
No tweets−8.89 × 10−6−5.27 × 10−6−0.06565678−0.55785771539.1980345520.91391437
ETHAll5.4 × 10-68.9 × 10−6−0.249002058−0.3338735867.4501284077.49117884
Positive5.01 × 10−54.88 × 10−53.27 × 10−51.52 × 10−5−0.45885−0.20885-0.27496-0.273297.8132647.9937837.4280387.557206
Neutral4.95 × 10−62.12 × 10−76.32 × 10−76.02 × 10−7−0.15422−0.23324-0.31073-0.280037.389977.3978177.5653027.482183
Negative−2 × 10−5−1.5 × 10−62.02 × 10−52.22 × 10−5−0.45553−0.39214-0.35417-0.587267.3988177.3009567.3131657.572112
No tweets6.78 × 10−65.88 × 10−6−0.0703079660.0269743137.7986210917.700499101
LTCAll−5.46 × 10−69.7 × 10−7−0.179185507−0.3000462337.7880371717.373820222
Positive1.11 × 10−51.53 × 10−51.78 × 10−58.28 × 10−6−0.3414−0.26047−0.25228−0.405457.3508027.2768337.8288387.389789
Neutral−5.4 × 10−6−8.6 × 10−6−4.8 × 10−65.13 × 10−7−0.11059−0.16753−0.42245−0.29897.9082987.875657.3186587.412558
Negative−1.8 × 10−6−4 × 10−69.9 × 10−6−3 × 10−6−0.31619−0.186310.162506−0.158387.6492597.6338567.3388257.289335
No tweets1.05 × 10−53.95 × 10−6−0.0373584430.0370973747.4615271877.41699811
XRPAll−2.58 × 10−6−1.66 × 10−6−0.02145997−0.1246683487.3543652537.653179333
Positive1.22 × 10−51.24 × 10−51.93 × 10−5−1.8 × 10−6−0.025510.0190640.165351−0.164397.7529387.2773416.8719557.962178
Neutral4.85 × 10−7−3.8 × 10−6−1 × 10−5−1.1 × 10−60.069680.009106−0.23124−0.143387.2899877.3689638.0003517.745979
Negative−2.66 × 10−5−8.2 × 10−61.41 × 10−59.7 × 10−6−0.36224−0.262260.0456570.028157.2836877.3305737.0414956.983758
No tweets1.68 × 10−51.27 × 10−5−0.126518695−0.0172564238.6206776417.687701938
Table A4. Statistical outputs (p-values, standard errors, R-squared, β-coefficients) for regressions of (presence of) tweets on Amihud ILLIQ.
Table A4. Statistical outputs (p-values, standard errors, R-squared, β-coefficients) for regressions of (presence of) tweets on Amihud ILLIQ.
Tweetsp-ValuesStandard Errors
Lag 0Lag 1Lag 2Lag 3Lag 0Lag 1Lag 2Lag 3
M1M2M1M2M1M2M1M2M1M2M1M2M1M2M1M2
BTCAll00000000
Positive0.2230.1390.0730.2480.060.0460.2290.09400000000
Neutral00.00100000000000000
Negative0.1060.0970.0280.0390.120.0270.0810.10900000000
ETHAll00000000
Positive0.2310.1220.0950.0550.0180.0110.0320.02600000000
Neutral0000000000000000
Negative0.0660.0790.050.0560.0280.0550.0530.07900000000
LTCAll00.00100.0020000
Positive0.1050.160.8280.7120.0610.0650.5860.09500000000
Neutral0000.0010.0010.0010.0010.0200000000
Negative0.1140.1220.5630.1430.1950.1770.9390.16300000000
TweetsR-squaredβ-coefficients
Lag 0Lag 1Lag 2Lag 3Lag 0Lag 1Lag 2Lag 3
M1M2M1M2M1M2M1M2M1M2M1M2M1M2M1M2
BTCAll00.00010.000100000
Positive0000000000000000
Neutral000.00010.0001000000000000
Negative0000000000000000
ETHAll00.00010.00010.00010000
Positive0000000000000000
Neutral00000.00010.00010000000000
Negative0000000000000000
LTCAll00000000
Positive0000000000000000
Neutral0000000000000000
Negative0000000000000000
Note: Model specification: DSent(Ti) = L (xβ) + ϵ(Ti), where lag = {0, 1, 2, 3}, x is ILLIQ(Ti − lag), and L(x)) = 1/(1 + (exp (−x))) is the logistic function.
Table A5. p-values for regressions of (presence of) Tweets and log returns.
Table A5. p-values for regressions of (presence of) Tweets and log returns.
p-ValueTweetsTweets–Independent VariableTweets–Dependent Variable
Lag 0Lag 1Lag 2Lag 3Lag 0Lag 1Lag 2Lag 3
M1M2M1M2M1M2M1M2M1M2M1M2M1M2M1M2
BTCAll0.9530.8780.4160.2760.9530.8780.4160.276
Positive0.6650.0450.3860.2360.0080.9110.4060.0510.6650.0450.3860.2360.0080.0890.4060.051
Neutral0.9470.5320.5950.7580.0070.8810.4840.4330.9470.5320.5950.7580.8810.8540.4840.433
Negative0.5290.520.1570.140.6670.8680.0690.0030.5290.520.1570.140.8680.9410.0690.003
ETHAll0.670.9910.6160.2360.670.9910.6160.236
Positive0.4170.3390.3570.8830.2490.7050.1430.8440.4170.3390.3570.8830.2490.7050.1430.844
Neutral0.8270.9830.8780.8750.470.8750.1990.1420.8270.9830.8780.8750.470.8750.1990.142
Negative0.2060.7460.2820.7750.4380.5140.1260.8790.2060.7460.2820.7750.4380.5140.1260.879
LTCAll0.5650.1520.5450.0460.5640.1520.5450.046
Positive0.0460.9420.4230.1070.8470.4830.6030.1590.0460.9420.4230.1070.8470.4830.6030.159
Neutral0.8610.2620.3120.2980.3740.810.1690.0510.8610.2620.3120.2980.3740.810.1690.051
Negative0.4430.2580.520.9040.8830.0970.1370.5310.4430.2580.520.9040.8830.0970.1370.531
XRPAll0.1520.0530.9640.7240.1520.0530.9640.724
Positive0.1830.080.5930.8350.8350.8510.0280.1830.1830.080.5930.8350.8350.8510.0280.183
Neutral0.3550.1980.0240.0450.7110.6550.1960.3190.3550.1980.0240.0450.7110.6550.1960.319
Negative0.0080.0120.5690.820.5280.2790.7870.8250.0080.0120.5690.820.5280.2790.7870.825
Note: Section ”Tweets-independent variable” provides results for the equation DSent(Ti) = L (x × β) + ϵ(Ti), where x is J(Ti − lag) and L(x) = 1/(1 + (exp (−x))) is the logistic function. Section ”Tweets-dependent variable” depicts results from the equation: J(Ti) = β × DSent(Ti − lag) + ϵ(Ti).

References

  1. Pedersen, L.H. Game on: Social networks and markets. J. Financ. Econ. 2022, 146, 1097–1119. [Google Scholar] [CrossRef]
  2. Broadstock, D.C.; Zhang, D. Social-media and intraday stock returns: The pricing power of sentiment. Financ. Res. Lett. 2019, 30, 116–123. [Google Scholar] [CrossRef]
  3. Chordia, T.; Roll, R.; Subrahmanyam, A. Market Liquidity and Trading Activity. J. Financ. 2001, 56, 501–530. [Google Scholar] [CrossRef]
  4. Hendershott, T.; Jones, C.M.; Menkveld, A.J. Does Algorithmic Trading Improve Liquidity? J. Financ. 2011, 66, 1–33. [Google Scholar] [CrossRef]
  5. Zhang, J.; Zhang, C. Do cryptocurrency markets react to issuer sentiments? Evidence from Twitter. Res. Int. Bus. Financ. 2022, 61, 101656. [Google Scholar] [CrossRef]
  6. Akyildirim, E.; Aysan, A.F.; Cepni, O.; Serbest, Ö. Sentiment matters: The effect of news-media on spillovers among cryptocurrency returns. Eur. J. Financ. 2024, 30, 1577–1613. [Google Scholar] [CrossRef]
  7. Choi, H. Investor attention and bitcoin liquidity: Evidence from bitcoin tweets. Financ. Res. Lett. 2021, 39, 101555. [Google Scholar] [CrossRef]
  8. Akyildirim, E.; Sensoy, A.; Corbet, S.; Yarovaya, L. Social media sentiment and its impact on cryptocurrency returns and volatility. J. Financ. Econ. 2021, 142, 101234. [Google Scholar]
  9. Huynh, T.L.D. Does Bitcoin React to Trump’s Tweets? J. Behav. Exp. Financ. 2021, 31, 100546. [Google Scholar] [CrossRef]
  10. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar] [CrossRef]
  11. Araci, D.T. FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv 2019. [Google Scholar] [CrossRef]
  12. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019. [Google Scholar] [CrossRef]
  13. Loughran, T.; McDonald, B. When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks. J. Financ. 2011, LXVI, 35–65. [Google Scholar] [CrossRef]
  14. Hutto, C.J.; Gilbert, E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media, Ann Arbor, MI, USA, 1–4 June 2014; pp. 216–225. [Google Scholar]
  15. Nair, M.; Abd-Elmegid, L.A.; Marie, M.I. Sentiment analysis model for cryptocurrency tweets using different deep learning techniques. J. Intell. Syst. 2024, 33, 20230085. [Google Scholar] [CrossRef]
  16. Lee, S.S.; Mykland, P.A. Jumps in Financial Markets: A New Nonparametric Test and Jump Dynamics. Rev. Financ. Stud. 2008, 21, 2535–2563. [Google Scholar] [CrossRef]
  17. Amihud, Y. Illiquidity and stock returns: Cross-section and time-series effects. J. Financ. Mark. 2002, 5, 31–56. [Google Scholar]
  18. Goyenko, R.Y.; Holden, C.W.; Trzcinka, C.A. Do liquidity measures measure liquidity? J. Financ. Econ. 2009, 92, 153–181. [Google Scholar] [CrossRef]
  19. Drożdż, S.; Kwapień, J.; Oświęcimka, P.; Stanisz, T.; Wątorek, M. Complexity in Economic and Social Systems: Cryptocurrency Market at around COVID-19. Entropy 2020, 22, 1043. [Google Scholar] [CrossRef] [PubMed]
  20. Smales, L.A. Investor attention and the cryptocurrency market. Int. Rev. Financ. Anal. 2022, 79, 101972. [Google Scholar] [CrossRef]
Figure 1. Window definition for measurement of impact.
Figure 1. Window definition for measurement of impact.
Data 10 00050 g001
Table 1. Selected Twitter accounts.
Table 1. Selected Twitter accounts.
CategoryAccountFollowers *
crypto influencersElon Musk @elonmusk71.2 M
Michael Saylor @saylor2.1 M
Anthony Pompliano @APompliano1.4 M
general news publicationsReuters @Reuters24 M
The Wall Street Journal @WSJ19.3 M
Financial Times @FT4.8 M
crypto-specific publicationsCoinDesk @CoinDesk2.6 M
Cointelegraph @Cointelegraph1.5 M
Bloomberg Crypto @crypto0.8 M
* Number of followers as per January 2022.
Table 2. Liquidity and volume during periods with/without tweets.
Table 2. Liquidity and volume during periods with/without tweets.
Tweets/No Tweets PeriodsLog Returns-Std DevVolume-MeanAmihud-Mean
BeforeAfterBeforeAfterBeforeAfter
M1M2M1M2M1M2M1M2M1M2M1M2
BTCAll0.00098870.0009482156.66849155.266452.35 × 10−52.35 × 10−5
Positive0.00090.00100.00090.0010160.21170.54153.05163.222.38 × 10−52.3 × 10−32.36 × 10−52.25 × 10−5
Neutral0.0010.00090.00090.0009157.28154.10156.16153.362.34 × 10−52.36 × 10−52.36 × 10−52.38 × 10−5
Negative0.00090.00100.00090.0009151.67158.76153.37159.042.32 × 10−52.29 × 10−52.3 × 10−52.25 × 10−5
No tweets0.00058180.000611380.90483982.052513.13 × 10−53.22 × 10−5
ETHAll0.00114080.00110271549.44721550.843.92 × 10−63.88 × 10−6
Positive0.00110.00110.00110.00111566.491677.081539.371644.363.84 × 10−63.58 × 10−63.92 × 10−63.63 × 10−6
Neutral0.001150.001130.00110.001101553.161525.101559.491536.853.93 × 10−63.99 × 10−63.88 × 10−63.95 × 10−6
Negative0.001080.001160.001090.001041522.161590.121526.601554.843.97 × 10−63.8 × 10−63.81 × 10−63.62 × 10−6
No tweets0.00083410.000862933.32961912.167576.46 × 10−66.64 × 10−6
LTCAll0.001320.0012752046.95582059.22025.19 × 10−64.96 × 10−6
Positive0.001310.001440.001240.001242084.812159.542033.282153.795.33 × 10−64.94 × 10−65.13 × 10−64.55 × 10−6
Neutral0.001340.00130.001280.001292060.902034.762082.962054.235.09 × 10−65.27 × 10−64.91 × 10−65.05 × 10−6
Negative0.001230.001330.001280.001181964.352021.661988.622002.115.49 × 10−64.94 × 10−65.02 × 10−64.72 × 10−6
No tweets0.00094170.00095371247.58851230.79259.85 × 10−61.02 × 10−5
XRPAll0.00151410.0015058466,770.24467,447.078.29 × 10−86.82 × 10−8
Positive0.001530.001660.001640.00158500,742.2524,344.6484,714.1515,844.81.08 × 10−75.77 × 10−86.42 × 10−86.12 × 10−8
Neutral0.001520.001490.001470.00150462,710459,772.5462,725.1460,876.98.2 × 10−88.93 × 10−86.86 × 10−86.98 × 10−8
Negative0.00140.00150.00150.0014456,097.5459,633.6472,123.7466,104.96.68 × 10−86.35 × 10−87.01 × 10−86.42 × 10−8
No tweets0.00100620.001017309,911.67311,668.930.00431430.0043143
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lupu, R.; Donoiu, P.C. Sentiment Matters for Cryptocurrencies: Evidence from Tweets. Data 2025, 10, 50. https://doi.org/10.3390/data10040050

AMA Style

Lupu R, Donoiu PC. Sentiment Matters for Cryptocurrencies: Evidence from Tweets. Data. 2025; 10(4):50. https://doi.org/10.3390/data10040050

Chicago/Turabian Style

Lupu, Radu, and Paul Cristian Donoiu. 2025. "Sentiment Matters for Cryptocurrencies: Evidence from Tweets" Data 10, no. 4: 50. https://doi.org/10.3390/data10040050

APA Style

Lupu, R., & Donoiu, P. C. (2025). Sentiment Matters for Cryptocurrencies: Evidence from Tweets. Data, 10(4), 50. https://doi.org/10.3390/data10040050

Article Metrics

Back to TopTop