- Analysis of social impact on crypto price moves
Just now·6 min read
Is crypto just a hype? There are two different opinions around crypto currencies: some people say ‘it is the future’, while others claim that it is just a ‘hype’. Exactly which metrics influence the crypto currency market is yet uncertain, but it is possible we can better understand it by tracking social responses like posts on twitter, reddit, and YouTube. Within this analysis, we will see what most impacts the crypto currency prices and whether we can predict price moves using external information. Additionally, we will build a crypto forecasting pipeline based on the findings.
Today, historical crypto data is easily retrievable thanks to services like coinmarketcap, nomics, messari, lunacrush and others. Further details on crypto data gathering is PROVIDED AND explained by Nicholas Resendez in Top 5 Cryptocurrency APIs for Developers. Based on the exploratory analysis, Lunacrush has the most helpful features for our purposes, and obtaining historical data is exceptionally straightforward:
| d | name | symbol |
| 1 | Bitcoin | BTC |
| 2 | Ethereum | ETH |
| 3 | XRP | XRP |
| 4 | Litecoin | LTC |
| 5 | Bitcoin Cash | BCH |
| 6 | Binance Coin | BNB |
| 7 | Tether | USDT |
| 8 | EOS | EOS |
| 9 | Bitcoin SV | BSV |
| 10 | Stellar | XLM |
| 11 | Cardano | ADA |
To get the detailed historical data about the crypto currency we use assets api:
Collected dataset contains 58 different data features and their historical values, including important ones we wish to explore: reddit_posts, tweets, tweet_sentiment, social_score, social_impact_score, ‘medium’, ‘youtube’, ‘social_contributors’, ‘social_volume’ and others. (Of course, we must assume a high degree of trust in the method in which Lunacrash collects and calculates these metrics.)
Let’s glance at the correlation between the closing price per day and other factors:
As we see, reddit posts and tweets have some impact on the close price of BTC. Correlation is just a linear measure of metrics at the same timestamp and may not help in predictive modeling. Next, we’ll use models for feature selection and get insights into building prediction models.
Based on correlation analysis and commonsense, metrics like ‘open’, ‘close’, ‘high’, ‘low’ are highly correlated. We would be interested in predicting only one of them. ‘Close’ is our target value and others will be removed from the dataset. Also some columns have too many NULL values so we’ll also remove them from the analysis.
In order to measure the impact of features that can be used to predict the crypto currency price using the historical data, we’ll create the lagged values for all the input features defining look back period and predicted step. As a result, we get the input dataset that is ready for supervised learning models. It contains M columns (where M is the number of all features multiplied by look back period) and N rows (where N is the number of original observations minus look back period). The idea is to use all the historical lagged values of social impact and market value to predict the price of crypto currency using data preparation and lagged features function.
Now we are ready to create lagged features both for the target value, as well as for exogenous variables. With a bit of data preparation, we can build a dataset ready for supervised learning.
Number of features in Original dataset 54
Target of prediction close
Total Number of generated features 1674. Examples of generated feature names: Index([‘close-30’, ‘volume-30’, ‘market_cap-30’, ‘url_shares-30’,‘unique_url_shares-30’, ‘reddit_posts-30’, ‘reddit_posts_score-30’, ‘reddit_comments-30’, ‘reddit_comments_score-30’,…
Input dataset is built using the last 30 days of historical data for 1 day ahead.
The correlation matrix of created features for the values bigger than 0.75 shows that in order to predict the close price of Bitcoin we should use lagged values of close, market_cap and market_cap_global. No social lagged values meet the threshold of this analysis. Nor do tweets from Elon Musk or other influencers impact the BTC price. Instead, the best predictors are its historical price and market value.
In the next step, we’ll try to predict the Close value using the standard XGBoost model and three different configurations: All features, Highly correlated features, and Model-Selected Features (using with sklearn SelectFromModel technique) . We will then compare results. We use data from the last 30 days for the testing, whereas other historical data is used to train the model.
Should we scale the values?
Depending on the model, we should definitely think about scaling the features. XGBoost is not sensitive to monotonic transformations of its features for the same reason that decision trees and random forests are not. A defined split of node on one scale has a corresponding split on the transformed scale. Hence, a decision was made to proceed without scaling. This may, however, be reconsidered using other models.
To assess the model’s performance we’ll use mean absolute error (MAE) and mean percentage error (MAPE). Detailed implementation is available in GitHub.
Input dataset is built using last 30 days of historical data for predicting 1 day ahead
1 features are selected based on the threshold of importance [‘close-0’]
Models comparison using MAE and MAPE
Based on the experiment, the best model for one day prediction for BTC is a model that contains the previous value of the close target. So for BTC there is no better way to predict the price than to just use original price time series data.
Will the conclusion made above work for different cryptos?
We’ve performed the experiment for other top 10 currencies, by market value, according to Coinmarketcap. Let’s see what we’ve got.
2 features are selected based on the threshold of importance ‘close-0’, ‘close-3’
25 features are selected based on the threshold of importance [‘close-17’, ‘market_cap-16’, ‘price_btc-1’, ‘close-8’, ‘close-0’, ‘close-1’, ‘social_volume-0’, ‘price_btc-7’, ‘market_cap-0’, ‘reddit_comments-11’, ‘alt_rank_30d-10’, ‘social_volume_global-11’, ‘reddit_comments-12’, ‘tweet_sentiment4–19’, ‘reddit_comments-10’, ‘close-19’, ‘volume_24h_rank-11’, ‘volume-20’, ‘reddit_posts-17’, ‘market_cap_global-1’, ‘social_volume_global-28’, ‘tweet_sentiment_impact3–25’, ‘close-4’, ‘volume-26’, ‘reddit_comments_score-18’]
4 features are selected based on the threshold of importance
[‘market_cap-0’, ‘price_btc-24’, ‘close-0’, ‘price_btc-23’]
19 features are selected based on the threshold of importance
[‘close-10’, ‘price_btc-0’, ‘market_cap-5’, ‘market_cap-4’, ‘price_btc-9’, ‘market_cap-3’, ‘market_cap-16’, ‘price_btc-1’, ‘tweet_spam-0’, ‘market_cap-18’, ‘close-19’, ‘reddit_posts-18’, ‘price_btc-3’, ‘close-0’, ‘close-1’, ‘price_btc-16’, ‘market_cap_global-28’, ‘tweets-6’, ‘tweet_quotes-7’]
3 features are selected based on the threshold of importance
[‘market_cap-0’, ‘close-0’, ‘market_cap-2’]
8 features are selected based on the threshold of importance
[‘close-1’, ‘close-0’, ‘social_volume_global-26’, ‘tweet_replies-0’, ‘tweets-0’, ‘correlation_rank-26’, ‘reddit_posts-11’, ‘market_cap_global-30’]
Comparison of prediction errors
As we can see, the impact of social media activity on crypto price movement is not consistent and depends on the coin itself. For famous and more stable coins, like BTC and ETH, best predictors are their historical price and market value. On the other hand, other coins’ prices with smaller market cap are dependent on many factors, including tweets, reddit posts, social volume and BTC price. Each crypto currency’s price is best predicted by the same feature selection method: “Model-Selected Features”. Adjusting a different configuration to different Crypto currencies helps improve the accuracy of crypto foecasting using XGBOOST technique. As a result, we were able to decrease the prediction error for all the tested examples.
Next step will be trying other machine learning models with selected features to improve the forecasting accuracy. In this post we’ve built one-step forecast only, and it will be interesting to see how this approach works for multi-step and long-term forecasts.
Thanks to Jayni Chopda for collaboration on this blog post.
- Date of publication:
- Thu, 11/25/2021 - 13:45
Click on the link - it will be copied to clipboard