1. Introduction
In the last few years, cryptocurrencies have drawn significant attention
[1]. The introduction of a decentralized currency, i.e., a currency for which there is no central authority responsible for its economic value, is of paramount importance, as this feature poses serious problems for regulators, investors, and scholars alike
[2]. Its primary function, unlike traditional currencies, is to actually cut off financial institutions and intermediaries, such as brokerages, exchanges, and banks, from transactions through a peer-to-peer blockchain technology. Furthermore, cryptocurrencies show other interesting features such as low transaction costs, safety of transactions and payments, and less control from central banks.
As a consequence, all these characteristics have quickly made cryptocurrencies popular, especially among retail investors
[3]. This rapid growth, however, has triggered high volume trading, a buying frenzy among retail investors, and pump-and-dump schemes, i.e., market manipulation fraud, which involves artificially raising the price of a cryptocurrency and then selling it at a higher price to other investors
[4,5][4][5]. Therefore, high volatility and herding behaviour in the market, which are not coherent with the traditional features of either a currency or a long-term investment, are frequently observed in cryptocurrency markets. More generally, the growing importance and ballooning market capitalization of cryptocurrencies has sparked concern over potential and long-feared side effects and interplays between traditional asset classes and cryptocurrencies, especially with respect to an increase of systemic risk spillovers from the latter to the former
[6].
Moreover, due to their unique features, studies have addressed the question of whether cryptocurrencies exhibit desirable properties against other asset classes (see
Figure 1). Existing empirical findings agree that certain cryptocurrencies act as a good diversifier with respect to commodities and stocks
[6[6][7],
7], although a growing body of studies
[8,9][8][9] point out that cryptocurrencies display poor safe haven and hedge properties
[10].
Figure 1.
Daily returns of a selection of cryptocurrencies and market indexes.
For example,
[11] showed that before the 2017 crash Bitcoin was a suitable hedge against energy commodities, while in the post-crash period this effect quickly faded away, leading to the conclusion that it is at most suitable as a diversifier. A hedge is generally denoted as an asset that is uncorrelated or negatively correlated with another asset or portfolio, while a diversifier is defined as as an asset that is positively but not perfectly correlated with another asset or portfolio
[12].
Consequently, a wide consensus has been reached over the speculative nature of cryptocurrencies
[3[3][13][14][15],
13,14,15], making them appealing for traders willing to reap profits from potential short-term sentiment-induced mispricings
[16]; to date, the quest to predict cryptocurrency prices has become a mainstream issue thanks to the vast availability of free data across different market regimes. Despite being considered simplistic, perhaps the most popular and leading valuation model among Bitcoin proponents is the stock-to-flow (S2F) model, which was proposed by an anonymous institutional investor operating under the pseudonym Plan B in 2019. In essence, this model tries to predict Bitcoin prices using historical data on the ratio between the total supply of Bitcoin and the increase in supply; both of these quantities are known with high precision. However, the main flaw of the model lies in the misleading assumption that scarcity directly drives future Bitcoin prices, even the predictor is known without uncertainty. According to basic asset pricing theory, this is a piece of information that must be already fully discounted by the market, and as such fully reflected in the current prices, as argued, among others, by
[17].
Thus, econometric and machine learning models for short-term price prediction of cryptocurrencies have blossomed in the last few years, with the goal of developing a rigorous and evidence-based framework for generating and appropriately assessing the quality of the forecasts. Various studies
[18,19][18][19] have specifically proposed artificial intelligence methods for this purpose, while other contributions have suggested using them jointly with social media data analysis
[20,21,22][20][21][22].
2. Statistics and Artificial-Intelligence-Based Price Prediction
This section will provide a brief outline of the literature on price prediction based on statistical
[35,36][23][24] and artificial intelligence-based models
[30][25], with a particular focus on information recovered from sentiment analysis
[37[26][27],
38], which is nowadays considered particularly valuable for forecasting purposes. A comprehensive summary is reported in
Table 1.
Statistical and econometric models, although outperformed by machine learning (ML) approaches for prediction purposes
[35][23], are generally useful, as they provide a highly interpretable baseline. In this way, the distinctive characteristics of more sophisticated strategies that take into account potential nonlinearities in the data or allow for more complex modelling, e.g., based on large predictor sets including features, their interactions and their nonlinear transformations, can be emphasized. In
[35][23], the authors compare and discuss a comprehensive summary of the previous studies in the field of cryptocurrencies price prediction from 2010 to 2020 and conclude that the latest contributions address the problem by focusing on ML models. These models have received increased attention in study and analysis of cryptocurrencies, mainly thanks to their better achievements in terms of accuracy.
In
[36][24], the authors focused on econometric modelling; they found and removed a seasonal component in the hourly Bitcoin data and subsequently generated closing price predictions using a simple AutoRegressive Integrated Moving Average (ARIMA) model. Relatively low accuracy was achieved, however, suggesting the possibility of further improvement by means of more sophisticated modelling.
It has been largely shown that prediction with neural network models and sentiment analysis are among the most powerful strategies for prediction purposes
[30][25] despite being highly parametrized. A comprehensive test of econometric, Machine Learning (ML), and Deep Learning (DL) models was performed in
[39][28] for cryptocurrency price prediction, including ARIMA, k-Nearest Neighbors (kNN), Support Vector Regression (SVR), Random Forests (RF), Long-Short Term Memory (LSTM) networks, Gated-Recurrent Units (GRU), LSTM-GRU networks (HYBRID), Temporal Convolution Network (TCN), and Temporal Fusion Transformer (TFT). The authors found that Recurrent Neural Networks (RNNs) with LSTM units outperformed other strategies, as
[30,40][25][29] pointed out in the context of cryptocurrency price prediction. Furthermore, the same authors contributed to the literature by showing that DL strategies outperform ML and econometric approaches and that more complex and parameterized models tend to generate better forecasts in terms of accuracy, which is somewhat surprising and inconsistent with recent cutting-edge research in the field of ML-based equity asset pricing
[29][30]. In a similar setting, ref.
[41][31] examined the long-term performance of various ML-based strategies for the
𝑆&𝑃500 out-of-sample directional movements and came to a similar conclusion, finding that a forecasting strategy based on a shallow LSTM network was the most effective method among those tested.
In
[18], the authors tested both feed-forward artificial neural networks (ANNs) and more complex Long Short-term Memory (LSTM) networks to analyze the price dynamics of Bitcoin, Ethereum, and Ripple. Surprisingly, the LSTM did not significantly outperform ANNs in terms of accuracy, especially when the latter were fed a long-term history of returns as input, whereas the former model was capable of dealing with short predictive memory lengths more efficiently. The authors concluded that cryptocurrency markets are not even weakly efficient, proving that past returns contain valuable information and have predictive potential which can be exploited to make profits by trading accordingly.
A more involved model was deployed by
[42][32], where a hybrid neural network based on a convolutional neural network (CNN) and an LSTM layer was proposed to forecast Bitcoin prices. In a nutshell, the CNN was used to extract influential features, which were then passed on to the LSTM layer for training and out-of-sample forecasting of the short-term price of Bitcoin. Moreover, the authors set up a model including a variety of diverse features, including transaction data, macroeconomic variables, investor attention, and technical indicators.
As far as sentiment analysis is concerned, Ref.
[43][33] analyzed tweets about Bitcoin to assess whether they conveyed positive or negative sentiments, then used them as input for a Recursive Neural Networks (RNNs). In
[38][27], the authors evaluated the predictive power of sentiment and explored statistical and deep-learning methods to predict the future price of Bitcoin by contributing an analysis of financial and sentiment features extracted from economic and crowdsourced data. A novel perspective was investigated by
[19]; in their proposed modelization, a stochastic neural network model was introduced to perform cryptocurrency price predictions with a broad range of features. They found that social sentiment data plays a key role in forecasting. More precisely, 23 features were retrieved from three different main sources. First, market-based data were used, such as the number of transactions, intra-day lows and highs, market capitalization, and volume. Second, crypto-specific and blockchain-based features were taken into account, namely, the mining difficulty and profitability, hashrate, transaction fees, and confirmation time. Finally, the influence of sentiment on prices was factored in by including the volume of tweets and Google Trends data. In
[44][34], the authors found that sentiment analysis based on the Valence-Aware Dictionary and sEntiment Reasoner tool is an invaluable predictor in cryptocurrency markets. A comprehensive study with different cryptocurrencies and an ensemble method combining cutting edge ML algorithms showed that sentiment data and Google Trends were especially effective at forecasting the short-term fluctuations of cryptocurrencies.
The practical value of sentiment data was assessed in
[10]. The authors proposed capturing cryptocurrency market sentiment by creating an ad hoc crypto-specific sentiment dictionary based on posts on a popular Chinese social media platform. Trend direction forecasting based on sentiment data and historical market prices shows promising results in terms of accuracy and recall compared to previous studies. Similar conclusions have been reached for Dash price prediction by
[45][35]; alternatively, ref.
[46][36] used random forests to forecast Bitcoin prices and documented that news feeds and tweets had little predictive power. In similar fashion, ref.
[47][37] constructed an hourly sentiment index by extracting and classifying signals from Twitter to predict the price fluctuations of a small-cap alternative cryptocurrency. Forecasts based on the Extreme Gradient Boosting Regression Tree Model were found to be particularly accurate, supporting the view that sentiment analysis provides additional value to predictions.
Moreover, several efforts have been made to combine neural networks with text and data mining approaches; Ref.
[48][38] created a machine learning model based on the price of Bitcoin, Google trends data, and custom related features. To this end, the authors compared a neural network with LSTM layers, a Gradient Boosting Regression Model, and an XGBoost model; according to their results, the first approach, based on deep learning, was the most efficient for prediction purposes. A combination of Twitter and Google Trends data was used as input in a simple multivariate linear regression model by
[20], which proved to be effective at generating a signal for the price direction. In
[21], the authors employed headline-based and tweet-based predictions, which were modelled by means of logistic regression, linear support vector machine, and naive Bayes models for a classification task, that is, predicting price increases and decreases. The authors did not find a robustly outperforming classifier across different cryptocurrencies, although their baseline logistic regression model performed relatively well across different datasets.
Similarly, Ref.
[37][26] compared Neural Network (NN), support Vector Machine (SVM), and Random Forest (RF) models while using market and Twitter data as input features, showing that the sentiment itself is effective enough to generate high quality forecasts without controlling for market features, at least for a subgroup of cryptocurrencies. Their results are consistent with
[48][38], where the authors showed that neural networks outperform other families of models.
Table 1.
Summary of the literature on statistics and artificial-intelligence-based price prediction.