TChecker: Fake News Detection on Social Media

TChecker: Fake News Detection on Social Media: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor:

The spread of fake news on social media continues to be one of the main challenges facing internet users, prohibiting them from discerning authentic from fabricated pieces of information. Detecting fake news is a problem tackled through different approaches that can be categorized mainly into a content-based approach and a social-based approach. In the content-based approach, the textual features are the main features, whereas in the social-based approach other features, including users’ engagements, users’ profile features, and network propagation features, are considered.

fake news
social media
news
BERT
BiLSTM

1. Introduction

In the era of Web 2.0, our interactions as well as our perceptions of information are changing. The ways of communication have evolved in recent decades in very fast and ground-breaking ways that are pushing some of the traditional ways for acquiring information into obsoletion. One of the most observable changes is the media, where in the near past television, radio, and newspapers were the primary credible sources of news and information for everyone. Reporters used to race to the locations of events to get the first scope. Live TV coverage of important events could get millions of people watching their TVs at the same time. Headlines in newspapers could change stock market values. Nowadays, it is very rare to find someone reading a newspaper, or watching news on TV; instead, social media is becoming the most prominent source for obtaining information about a news event.

Extracting information from social media has become a very rich area of research, as it has became one of the fastest growing sources of data about almost everything. As Web 2.0 and social media enabled internet users to contribute through their use, this allowed anyone to post and share data about anything, which in turn created a huge repository of data for everyone to access. However, with the power given to everyone to post and share anything on social media comes great responsibility for the content being posted or shared. Unfortunately, social media users do not usually tend to validate or fact-check their posts before sharing them, as they tend to believe what is shared many times within their circle of friends on social media. This phenomenon has been studied in the literature and is known as the validity effect ^[1], where people tend to have more belief in what is shared through their close circles as this emphasizes their feeling of validation. In addition, users will tend to share more posts that are aligned with their ideas and previous knowledge, regardless of the truthfulness of these ideas, which is known as confirmation bias ^[2]. These combined phenomena gives individuals a false sense of credibility about any piece of news that is shared in their circle of acquaintance, and thus share it themselves to other circles, etc., thus leading to less credibility of the news sources themselves, which most often rely on social media as well to gather information ^[3].

A study presented in ^[4] on Twitter in the period between 2006 and 2017 showed that fake information spreads faster and wider than true information. They found that fake news is 70% more likely to be retweeted, therefore reaching people much faster. The effect of spreading fake news across different social media platform can be disastrous in many aspects of life. It can bias political campaigns and decisions, like what happened in the “Brexit” referendum ^[5], 2016 US elections, ^[6] and the recent 2020 US elections as well. One rumor can cause the stock market to lose millions of dollars. For example, a rumor about former US president Barack Obama being injured in an explosion cost the stock market millions of dollars ^[7]. With the emerge of the COVID-19 pandemic, lots of rumors spread over the social network about home remedies, off the shelf chemicals, and deadly side effects of new vaccines, which actually put lives at risk in believing those rumors ^[8].

2. Fake News Detection on Social Media

2.1. Content-Based Approach

The most adopted approach in the task of identifying fake news is the content-based approach. In this approach, the textual features of the news are used in different models to identify the veracity of the news. This approach has been widely applied in detecting fake news from news posts and social posts.

Fake News Detection from News Articles

Verifying the truthfulness of news is a crucial step in the domain of publishing news, and checking the credibility of the source of information is an undeniable step in the publishing process. In the media domain, where journalists and others work in this domain, the verification of the news and its sources is their job. Journalists usually check the information against credible sources and verify this information is true before publishing it, that is, manual fact-checking.

With the increase in the volume of data roaming the internet every second, automatic techniques stepped in to help in the fact-checking process. natural language processing and Information Retrieval techniques are applied to automatically identify fake news. A binary translating embedding (B-TransE) model was introduced in ^[9] to detect fake news based on a knowledge base graph; they evaluated their model by applying it to check the news in the dataset “Getting real about fake news” provided by Kaggle. CompareNet ^[10] is an end-to-end graph neural model that compares fake news against a knowledge base using entities.

The style-based approach relies on the content of the news post to detect its truthfulness based on the style of writing in the post. The style of writing would reveal the user’s intention to post false or true information. The style of the writing is represented as features to be fed to the model for detecting the truthfulness of the post. This approach was used by ^[11]^[12]^[13] in deception detection, where deception is defined as the bad intention of authors to post intentionally false information. Those features were used in detecting fake news from news articles in ^[14]^[15].

Different machine learning algorithms have been applied to detect fake news from news posts through applying classification models on the textual content of the news. An analysis of different classifiers’ performance on the LIAR dataset is presented in ^[16]; a comparison between the performance of Naïve Bayes, SVM, Random Forest, Logistic regression, and stochastic gradient classifier is presented, showing that the classifiers obtained near results except for the stochastic gradient classifier, which performed worse than the others. Another comparison between the Naïve Bayes, Random Forest, passive aggressive, and LSTM models is presented in ^[17]; they applied the three models on a dataset consisting of 11,000 English articles labeled as fake or real. They showed that the passive aggressive classifier with TF-IDF representation could achieve the highest accuracy and F1 score, while the LSTM model could achieve almost the same accuracy, but it obtained a higher precision compared to that achieved by the passive aggressive classifier.

A classification model based on BiLSTM and self-attention layers is presented in ^[18] and applied on a dataset provided from Kaggle that consists of news articles labeled as fake or not. The articles are represented using GloVe ^[19] embeddings, then fed to the BiLSTM layer, followed by the self-attention layer, and then finally the classification layer. They compared their model to other models using different text representations, such as TF-IDF and BOW, and different neural networks, such as GRU, LSTM, and CNN.

Upon the introduction of BERT in 2019 as a pre-trained language model using deep bidirectional transformers, a major change in performance in NLP tasks occurred. BERT differs from other deep learning embeddings like word embedding and sentence/document embedding

An evaluation of different language models, including BERT, RoBERTa ^[20], and DistilBERT ^[21], is presented in ^[22]; the authors also compared different architectures of neural network models, including a simple fully connected network, a CNN, and a combined CNN and RNN. They applied the models on different datasets of long and short text including Twitter datasets. Their study showed that simple neural network models can perform better than sophisticated models, and amongst the language models, RoBERTa performed slightly better on most of the datasets. However, all language models’ performances were close to each other. Following the content-based approached, a BERT model is used followed by LSTM and fully connected layers, which is presented in ^[23]; they showed that vanilla BERT models could perform better than other content-based models. Moreover, adding the LSTM layer improved the performance of the model news titles of the Politifact dataset from those of the FakeNewsNet dataset ^[24]. Korean fake news was detected by ^[25] using a BERT-based model trained on the Korean language and BiLSTM for classification. A combination of three parallel blocks of 1D convolutional neural networks and a BERT model applied on news articles from a dataset collected during the 2016 US elections, provided by Kaggle, is presented in ^[26]. The model showed better performance than models using GloVe representations and other model architectures based on LSTM and CNN individually.

2.2. Fake News Detection from Social Media

Detecting fake news from social media is increasing these days. Different approaches are applied to detect fake news from social media posts in order to mitigate the harm caused by their spread. Early in 2016, a dataset was collected from Twitter by ^[27] at the time of five major events reported by journalists. A model of a convolutional neural network and LSTM was applied to this dataset and the results are presented in ^[28]. The LSTM model could achieve better results in terms of accuracy and F1 score than the CNN alone and the LSTM and CNN combined.

With the emergence of the COVID-19 pandemic, all researchers from different disciplines tried to find a way to reduce the effect of the pandemic worldwide. One way was to detect false information roaming social media. A dataset was collected and annotated from Twitter by ^[29] discussing different topics about the COVID-19 virus. They used Bag of Words (BOW) and n-grams to represent the text of tweets. They applied an ensemble of Naïve Bayes (NB), K Nearest Neighbor (KNN), Random Forest, function-sequential minimal optimization (SMO), and voted perceptron (VP). They found that the highest F1 score was achieved by the vote ensemble classifier. Different machine learning models which were applied to classify a set of collected tweets about COVID-19 are presented in ^[30]. The results showed that Random Forest could outperform other models like SVM, linear regression, and LSTM.

Different BERT-based models were used to detect misinformation from tweets specially regarding COVID-19. COVID-Twitter-BERT is introduced in ^[31]; it is a model of BERT pre-trained on a large corpus of tweets about COVID-19. The model is very topic-specific and can be useful in various NLP tasks related to representing tweets about COVID-19, such as detecting sentiments of tweets in ^[32]. Another COVID-19 dataset was collected from Twitter, and ensemble machine learning was applied in ^[33].

BERTweet followed by an output classification layer was used in ^[34] to detect misinformation from tweets. The model outperformed other text representation models such as GLoVe. BERTWeet was compared with BERT cased and uncased pretrained models for detecting fake tweets about COVID-19 in ^[35]. BERTweet showed the best performance among different BERT models. For Arabic tweets, Ref. ^[36] presented a deep learning model based on ARaBERT, which is a BERT model trained on Modern Standard Arabic, to represent the tweets. The model uses the tweets’ text and user features to detect the veracity of the tweets. BiLSTM and CNN networks were used for classification, where they showed close performance.

2.3. Social-Based Approach

In an attempt to understand more fake news through users’ comments, an explainable fake news detection is presented in ^[37]. They use a model that relies on identifying comments that are explaining the core parts of the news article and how they are fake or not. The model is based on a co-attention network between the news articles and their users’ comments. They also applied a ranking method to pick the comments with the most explanation. They compared their work on PolitiFact and GossipCop datasets to other content-based models using news articles only, and models that consider users’ comments. They showed their model could achieve better results than the models in comparison.

Capturing features from comments and using them along with a two-level convolutional neural network learning representation from news content is presented in ^[38] as TCNN-URG. They use a conditional variational autoencoder for user comment generation to assist the news content classifier when user comments do not exist.

Integrating the features of the text of the news article, users’ comment on it, and the source of the news is presented in ^[39] as a CSI model. Their model consists of three parts: the first one uses LSTM to capture the temporal representation of the article, the second part represents the user features, and the third part concatenates the results of the earlier parts into a classification model. Their experiments were performed on Twitter and Weibo datasets ^[40] and showed better results over content-based models.

TriFN is a tri-relationship embedding framework proposed by ^[41] that represents publisher–news relations and user–news interactions as embeddings and uses them together to detect fake news. They applied their model on PolitiFact and BuzzFeed datasets.

Incorporating the article’s textual content, along with its creator and the subject of the news article, into a model presented as a deep diffusive network model is proposed in ^[42]. The latent features are extracted from the articles’ text, their creators, and the subjects. The model uses a gated diffusive unit that accepts multiple inputs from different sources at the same time.

This entry is adapted from the peer-reviewed paper 10.3390/app132413070

References

Boehm, L.E. The validity effect: A search for mediating variables. Personal. Soc. Psychol. Bull. 1994, 20, 285–293.
Nickerson, R.S. Confirmation bias: A ubiquitous phenomenon in many guises. Rev. Gen. Psychol. 1998, 2, 175–220.
Yuan, L.; Jiang, H.; Shen, H.; Shi, L.; Cheng, N. Sustainable Development of Information Dissemination: A Review of Current Fake News Detection Research and Practice. Systems 2023, 11, 458.
Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Science 2018, 359, 1146–1151.
Pogue, D. How to Stamp Out Fake News. Sci. Am. 2017, 316, 24.
Allcott, H.; Gentzkow, M. Social Media and Fake News in the 2016 Election. J. Econ. Perspect. 2017, 31, 211–236.
Rapoza, K. Can ‘Fake News’ Impact the Stock Market? Section: Investing. Available online: https://www.forbes.com/sites/kenrapoza/2017/02/26/can-fake-news-impact-the-stock-market/?sh=129496f92fac (accessed on 10 September 2023).
Cinelli, M.; Quattrociocchi, W.; Galeazzi, A.; Valensise, C.M.; Brugnoli, E.; Schmidt, A.L.; Zola, P.; Zollo, F.; Scala, A. The COVID-19 social media infodemic. Sci. Rep. 2020, 10, 16598.
Pan, J.Z.; Pavlova, S.; Li, C.; Li, N.; Li, Y.; Liu, J. Content Based Fake News Detection Using Knowledge Graphs. In Proceedings of the Semantic Web—ISWC 2018, Monterey, CA, USA, 8–12 October 2018; Vrandečić, D., Bontcheva, K., Suárez-Figueroa, M.C., Presutti, V., Celino, I., Sabou, M., Kaffee, L.A., Simperl, E., Eds.; Lecture Notes in Computer Science. Springer International Publishing: Cham, Switzerland, 2018; pp. 669–683.
Hu, L.; Yang, T.; Zhang, L.; Zhong, W.; Tang, D.; Shi, C.; Duan, N.; Zhou, M. Compare to The Knowledge: Graph Neural Fake News Detection with External Knowledge. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 754–763.
Siering, M.; Koch, J.A.; Deokar, A.V. Detecting Fraudulent Behavior on Crowdfunding Platforms: The Role of Linguistic and Content-Based Cues in Static and Dynamic Contexts. J. Manag. Inf. Syst. 2016, 33, 421–455.
Zhang, D.; Zhou, L.; Kehoe, J.L.; Kilic, I.Y. What Online Reviewer Behaviors Really Matter? Effects of Verbal and Nonverbal Behaviors on Detection of Fake Online Reviews. J. Manag. Inf. Syst. 2016, 33, 456–481.
Braud, C.; Søgaard, A. Is writing style predictive of scientific fraud? arXiv 2017, arXiv:1707.04095.
Bond, G.D.; Holman, R.D.; Eggert, J.A.L.; Speller, L.F.; Garcia, O.N.; Mejia, S.C.; Mcinnes, K.W.; Ceniceros, E.C.; Rustige, R. ‘Lyin’ Ted’, ‘Crooked Hillary’, and ‘Deceptive Donald’: Language of Lies in the 2016 US Presidential Debates. Appl. Cogn. Psychol. 2017, 31, 668–677.
Potthast, M.; Kiesel, J.; Reinartz, K.; Bevendorff, J.; Stein, B. A Stylometric Inquiry into Hyperpartisan and Fake News. arXiv 2017, arXiv:1702.05638.
Agarwal, V.; Sultana, H.P.; Malhotra, S.; Sarkar, A. Analysis of Classifiers for Fake News Detection. Procedia Comput. Sci. 2019, 165, 377–383.
Rohera, D.; Shethna, H.; Patel, K.; Thakker, U.; Tanwar, S.; Gupta, R.; Hong, W.C.; Sharma, R. A Taxonomy of Fake News Classification Techniques: Survey and Implementation Aspects. IEEE Access 2022, 10, 30367–30394.
Mohapatra, A.; Thota, N.; Prakasam, P. Fake news detection and classification using hybrid BiLSTM and self-attention model. Multimed. Tools Appl. 2022, 81, 18503–18519.
Pennington, J.; Socher, R.; Manning, C. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Stroudsburg, PA, USA, 2014; pp. 1532–1543.
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692.
Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2020, arXiv:1910.01108.
Anggrainingsih, R.; Hassan, G.M.; Datta, A. Evaluating BERT-Based Pre-Training Language Models for Detecting Misinformation. arXiv 2022, arXiv:2203.07731.
Rai, N.; Kumar, D.; Kaushik, N.; Raj, C.; Ali, A. Fake News Classification using transformer based enhanced LSTM and BERT. Int. J. Cogn. Comput. Eng. 2022, 3, 98–105.
Shu, K.; Mahudeswaran, D.; Wang, S.; Lee, D.; Liu, H. FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media. arXiv 2019, arXiv:1809.01286.
Lee, J.W.; Kim, J.H. Fake Sentence Detection Based on Transfer Learning: Applying to Korean COVID-19 Fake News. Appl. Sci. 2022, 12, 6402.
Kaliyar, R.K.; Goswami, A.; Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 2021, 80, 11765–11788.
Zubiaga, A.; Liakata, M.; Procter, R. Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media. arXiv 2016, arXiv:1610.07363.
Ajao, O.; Bhowmik, D.; Zargari, S. Fake News Identification on Twitter with Hybrid CNN and RNN Models. In Proceedings of the 9th International Conference on Social Media and Society, Melbourne, Australia, 15–20 July 2018; ACM: Copenhagen, Denmark, 2018; pp. 226–230.
Olaleye, T.; Abayomi-Alli, A.; Adesemowo, K.; Arogundade, O.T.; Misra, S.; Kose, U. SCLAVOEM: Hyper parameter optimization approach to predictive modelling of COVID-19 infodemic tweets using smote and classifier vote ensemble. Soft Comput. 2022, 27, 3531–3550.
Jeyasudha, J.; Seth, P.; Usha, G.; Tanna, P. Fake Information Analysis and Detection on Pandemic in Twitter. SN Comput. Sci. 2022, 3, 456.
Müller, M.; Salathé, M.; Kummervold, P.E. COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter. arXiv 2020, arXiv:2005.07503.
Lin, H.Y.; Moh, T.S. Sentiment analysis on COVID tweets using COVID-Twitter-BERT with auxiliary sentence approach. In Proceedings of the 2021 ACM Southeast Conference, ACM SE ’21, Online, 15–17 April 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 234–238.
Dadgar, S.; Ghatee, M. Checkovid: A COVID-19 misinformation detection system on Twitter using network and content mining perspectives. arXiv 2021, arXiv:2107.09768.
Kumar, A.; Jhunjhunwala, N.; Agarwal, R.; Chatterjee, N. NARNIA at NLP4IF-2021: Identification of Misinformation in COVID-19 Tweets Using BERTweet. In Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda, Online, 6 June 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 99–103.
Kim, M.G.; Kim, M.; Kim, J.H.; Kim, K. Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter. Int. J. Environ. Res. Public Health 2022, 19, 5126.
Alyoubi, S.; Kalkatawi, M.; Abukhodair, F. The Detection of Fake News in Arabic Tweets Using Deep Learning. Appl. Sci. 2023, 13, 8209.
Shu, K.; Cui, L.; Wang, S.; Lee, D.; Liu, H. dEFEND: Explainable Fake News Detection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; ACM: New York, NY, USA, 2019; pp. 395–405.
Qian, F.; Gong, C.; Sharma, K.; Liu, Y. Neural User Response Generator: Fake News Detection with Collective User Intelligence. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; International Joint Conferences on Artificial Intelligence Organization: Stroudsburg, PA, USA, 2018; pp. 3834–3840.
Ruchansky, N.; Seo, S.; Liu, Y. CSI: A Hybrid Deep Model for Fake News Detection. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; ACM: New York, NY, USA, 2017; pp. 797–806.
Ma, J.; Gao, W.; Mitra, P.; Kwon, S.; Jansen, B.J.; Wong, K.F.; Cha, M. Detecting rumors from microblogs with recurrent neural networks. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, New York, NY, USA, 9–15 July 2016; AAAI Press: New York, NY, USA, 2016; pp. 3818–3824.
Shu, K.; Wang, S.; Liu, H. Beyond News Contents: The Role of Social Context for Fake News Detection. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, VIC, Australia, 11–15 February 2019; ACM: New York, NY, USA, 2019; pp. 312–320.
Zhang, J.; Dong, B.; Yu, P.S. FakeDetector: Effective Fake News Detection with Deep Diffusive Neural Network. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, 20–24 April 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1826–1829.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.