Hashtag recommendation suggests hashtags to users while they write microblogs in social media platforms. Although researchers have investigated various methods and factors that affect the performance of hashtag recommendations in Twitter and Sina Weibo, a systematic review of these methods is lacking.
1. Introduction
Social media platforms have become fast-growing and influential media that enable people to communicate with each other easily, share information and search for exciting topics. The number of social media users was 3.6 billion in 2020, and this number is predicted to increase to 4.41 billion users in 2025 (
https://www.statista.com/statistics/278414/ (accessed on 10 May 2021)). Twitter (
https://about.twitter.com/company (accessed on 10 May 2021)) is a microblogging social media platform that permits users to write and share short messages of 280 characters or less, including hashtags, mentions and URLs. These types of short messages are referred to as “microblogs” and “tweets”
[1][2][3][4][5][6][7]. Founded in 2006, Twitter has quickly become an increasingly popular and powerful tool worldwide. According to the Internet Live Stat (
http://www.internetlivestats.com/twitter-statistics/ (accessed on 10 May 2021)), 500 million messages on average are posted per day by 330 million active users. In July 2020 (
https://www.statista.com/statistics/242606/ (accessed on 10 May 2021)), the United States had the largest audience size, with 62.55 million users, followed by Japan with 49.1 million users, and India ranked third with 17 million users. Sina Weibo (
https://www.statista.com/statistics/795303/china-mau-of-sina-weibo/ (accessed on 10 May 2021)), the Chinese equivalent of Twitter, had around 523 million active users in the same year.
With the information overload and increase in technology dependency, social recommendations have become a key research area. Social recommendation systems can be defined as techniques or algorithms that automatically suggest the most relevant and interesting data to social media users. Hashtag recommendation is a branch of the social recommendation systems that proposes contemporary and relevant hashtags to users as they type tweets
[5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]. Choosing the correct hashtag has several benefits: it enables the user to quickly join a discussion and read tweets written by other users
[24]. Using hashtags gives the user a chance for their tweets to be noticed and reach a wider audience
[25]. Hashtags also help researchers to analyze users’ behaviors or to predict the outbreaks of natural disasters and epidemics
[26]; they are also helpful for companies to advertise their products and improve customer services and support through users’ complaints and comments
[27]. In politics, politicians can communicate with the public and advertise their campaigns
[28]. Moreover, people can raise and share their voice nationally and globally
[29]. For Twitter and Sina Weibo, recommending hashtags helps to enhance discussion as users are guided to use more accurate and relevant hashtags. Adopting the right hashtags helps Twitter/Sina Weibo to eliminate insignificant and noisy hashtags and reduce information overload. Automatically recommending personalized hashtags to users also helps them save time and effort in searching for relevant hashtags.
Recommending hashtags by analyzing tweets and extracting information from the Twitter/Sina Weibo hashtag universe can be a very challenging task. One of the challenges is that these tweets and hashtags are user-generated. Users tend to use informal language when writing their tweets; for example, users use “4U” to mean “for you”, “AMA” for “ask me anything” and “BFN” for “bye for now”. Spelling and grammatical mistakes are not checked or corrected. Short texts are therefore more difficult to analyze than long texts. The facts that tweets are short texts and are noisy add extra complication to the data. Furthermore, hashtags can be acronyms, shortened or misspelled words or a combination of words, numbers and punctuation marks. Thus, using hashtags as keywords does not necessarily convey the meaning of the discussion. The lack of control over the creation of hashtags has resulted in hundreds of hashtags being associated with a single discussion topic and different discussion topics being associated with a single hashtag.
Hashtag recommendation can be either general, when the suggested hashtags are obtained based on the data of all users, or personalized, when the suggested hashtags incorporate the user’s preferences and data. The hashtags from all the tweets in a dataset form a space known as the “hashtag space”. The suggested hashtags are said to be novel if they are not in this hashtag space (i.e., not previously used by other users). Otherwise, they are said to be predefined.
2. A New Taxonomy for Hashtag Recommendation of Tweets
The taxonomy classifies hashtag recommendation methods for tweets into three main categories: text-based, hybrid user-based and hybrid miscellaneous methods. Text-based methods find hashtags similar to what a user intends to adopt based on the textual information. This category is further classified into tweet-similarity-based methods, probabilistic methods, classification based methods, graph-based methods and matrix factorization based methods. Since methods of collaborative filtering suffer from the cold-start problem, they are integrated with other methods. Hybrid user-based hashtag recommendation methods recommend hashtags based on the similarity of the users’ behavior, interests or relations. This category is further classified into behavioral and social collaborative filtering methods. Hybrid miscellaneous hashtag recommendation take advantage of multi-modalities and multi-factors to recommend the hashtags. Regardless of the specific techniques employed, it has become clear that the best outcome can be achieved using the hybrid methods (user-based or miscellaneous) for their ability to overcome problems occurring with content-based and collaborative filtering methods. It was noticed that understanding various factors that affect the performance of hashtag recommendation and the underlying assumptions have a significant impact on the algorithmic approach that should be considered.
We highlight some open challenges, which can be considered future research directions. These challenges are as follows:
-
Despite the advancement of the current methods, further improvements are required to propose more effective methods that are less expensive in terms of time and computation and provide a personalized recommendation that covers a broader range of pre-defined and novel hashtags with higher accuracy. Furthermore, most of the previous research was tested offline. Recommending personalized hashtags in real-time is more difficult where the recommended hashtags need to be accurate and given instantly.
-
As an extension to work presented in Alsini et al.’s paper
[23], the association of the four networks and their combined effect on the performance of hashtag recommendation can be examined. In addition, rather than considering the mutual tie relationships between users, weighted relationships can be used to construct the networks and detect communities.
-
It is challenging to compare newly proposed methods with baseline methods due to the variance in the size of the datasets (i.e., number of tweets, users, and hashtags). It is recommended for future research papers to set a minimum size of the dataset for evaluation.
-
Accuracy-based metrics were the primary measures of evaluation for a long time. In recent years,
concepts of evaluation, which are metrics beyond accuracy, have been studied to evaluate the value of the traditional recommendations. For example,
diversity is concerned with the variety of items recommended by the system, and
novelty is concerned with how the recommended items are new to users
[30][31]. However, concepts of the evaluation were rarely used to evaluate hashtag recommendation methods. The value of the recommendations also needs to be studied in terms of user satisfaction and expectation.
-
With the dynamic nature of social media platforms, studies of hashtag recommendation should focus more on the automatic update of the data on the recommendation.