Please note this is a comparison between Version 1 by Ahmad Malik and Version 2 by Sirius Huang.
During the COVID-19 spread, people shifted their physical activities to the digital world for containment of the novel disease. During this outbreak, online platforms became the prime sources of information for different purposes. Online education started and most of the larger organizations advised their employees to work from home. During this novel outbreak, it has been analyzed that some of the social media users used bots to spread fake news and manipulate data about COVID-19. Furthermore, the researcher worked to collect datasets related to the pandemic and various social media applications and published them for other researchers.
The following sections will describe how physical activities shifted to the digital world and how social media applications, such as Twitter, played an important role to spread fake news, as shown in Figure 1.
Digital activism and social media analysis techniques.
2. Digital Activism
This section covers how office activities and physical education is shifting towards the digital world and job holders as well as students are facing many issues.
2.1. Online Office Activities
The novel virus named COVID-19 compelled many organizations and individuals to operate from home instead of offices. In early 2020, when the virus spread globally, many organizations closed their operations and advised their employees to work from home. However, this is also true that many people were unemployed during the pandemic. Countries recovering from the pandemic are now reopening their offices and some of them have ensured 50% attendance of their employees per day. In 
, 50 world-leading organizations were taken under consideration to look into the actions made during the pandemic. For conducting the research, the content analysis technique was applied to social network posts and web pages to extract 77 activities that were divided into 13 sub-areas and then linked them to a five-level framework that mainly enclosed the functions, such as workforce, leadership, community-related responses, and customers. In the end, it is concluded that organizations were helpful for employees in response strategies and took suitable measurements during the pandemic.
2.2. Online Education System
Everyday, we spend a lot of our time on the Internet, in particular on social media apps such as Twitter, Facebook, WhatsApp, and Instagram. However, due to the lockdown situation created by the COVID-19 pandemic, the number of internet users surged to a record high. Most of the office and education activities are shifted from physical to online systems. Today, people are using different apps such as Zoom, Microsoft Teams, and Skype for online meetings. Therefore, it is necessary to conduct research about how the current pandemic affects our businesses, health system, and especially our education system. In 
, a qualitative paradigm along with a descriptive approach was adopted by the author to discuss the effects of COVID-19 on the education system, as well as the negative effects of online learning on the health of humans. The authors discussed that the online learning system is based on applications. These applications are very helpful, however, they are facing some issues, such as signal interference, high data consumption, and so on. Due to online learning, students are facing difficulties in understanding the basic concepts. While some students who belong to backward areas are complaining about the unavailability of the internet and poor signals during online exams. Apart from that, online learning also affects the health of students. Staring at the computer screen for a long time creates eyesight and neck problems, as well as mental problems. In the end, the authors concluded that every social media user must consider the aforementioned problems and try to overcome these problems while using online applications.
3. Social Media
This section discusses how people spread fake news over social media platforms, such as Twitter, Facebook, and other apps, during the pandemic. Moreover, it discusses what techniques are applied to extract information from social media platforms and what type of results are compiled from that information.
3.1. Social Media Search Indexes (SMSI)
Today, social media is considered one of the main pillars of life. It is also considered the rapid way of spreading any information or news. It is a fact that many websites and social media profiles are designed to pass on fake news and spread misinformation to misguide people on various issues. During the COVID-19 outbreak, many social media users and websites used bots to form and spread conspiracy theories. In 
, the authors performed a comparison between the spreading of links having low credibility information with the articles of the well-known newspaper, New York Times during an outbreak. For this study, two datasets were collected by using different techniques. The first dataset is based on tweets having a set of hashtags and links. Some of the hashtags are linked with the word coronavirus, whereas some hashtags enlighten the various aspects of the pandemic. For their study, the authors only focused on two keywords, #coronavirus and #covid19. The second dataset is based on tweets having links of low credibility of information. The links were fetched, and linked web pages were also extracted. The authors applied various content analysis and network analysis techniques to perform a comparison between tweets having links of incredible information with New York Times articles. The final results showed that the majority of content is posted either by bots or retweets. Social bots played a negative role in promoting tweets with unreliable links.
Twitter is one of the incredible sources to measure public responses. In 
, the author examined Twitter tweets about the COVID-19 pandemic that were posted by Twitter users. The author utilized an ML technique named Latent Dirichlet Allocation (LDA) to identify the famous unigrams and bigrams words. For this purpose, 4 million tweets with 20 COVID-19 related hashtags were analyzed. The final result showed that “lockdown”, “Virus” and “Quarantine” were detected as the most popular unigrams while “stay home”, “social distancing”, and “new cases” were the most common bigram words collected from Twitter data.
3.2. Misinformation on Social Media
During the COVID-19 outbreak, many false and manipulated conspiracies are being formed and spread every day on incredible social media applications, such as Twitter, Facebook, and Instagram. Most of the news is based on falsification and has no proof of validity. In 
, the study is divided into two parts and conducted to analyze how many posts were published true and how many were false about COVID-19 on social media. During the COVID-19 outbreak, misinformation was circulated in many forms, such as the idea that a novel virus was created in China to use as a biological weapon, while others claimed that this deadly virus was spread by the United States to get rid of older people who were receiving a pension from the government to decrease the economic burden. The final results of the first part of this study showed that most of the people posted about COVID-19 on social media without deliberation and they had not enough information about COVID-19 to share. The second part of this study concluded that people must think before sharing and validating the accuracy of news of COVID-19 on social media. World leaders across the world are using the Twitter platform to convey important information about the pandemic to their citizens. In 
, content analysis was performed by the authors to examine the tweets posted by the G7 leaders. The data for this study were collected from Twitter with special keywords such as “COVID-19” and “Coronavirus”. The final result of this qualitative research showed that eight out of nine verified accounts of G7 leaders participated in tweets. A total of 166 (82.8%) out of 203 were informative tweets, 14 (6.9%) were political posts and 19 (9.4%) tweets were about encouraging the people.
During the pandemic, much false news was propagated on social media platforms, among them Twitter is considered a leading platform for the spread of misinformation. To deal with this situation the author in 
performed exploratory research on the data collected from Twitter about COVID-19 misinformation. The author analyzed all the tweets, collected by the fact-checking organizations from January to the middle of July 2020. In the end, it was concluded that 1245 out of 1500 were false tweets while the remaining 276 were false claims about the COVID-19 pandemic. In 
, the author first created a dataset of fake and real news related to the topic of COVID-19. For the proposed dataset, the data were collected from social media platforms and several fact-checking websites. A total of 10,700 posts and articles of fake and real news were collected from social media platforms. Secondly, the author utilized four ML algorithms named logistic regression, support vector machine, gradient boost, and decision tree to classify these posts. The final result showed that the support vector machine delivered good accuracy (93.4%) while the logistic regression delivered (92.75%) accuracy. The authors in 
used various AI and ML models to help out social media platforms and government agencies by highlighting misinformation on social media. Several ML models, such as Naïve Bayes, LibLinear, LibShortText, and support vector machine, was used during this study. However, the final results reported that LibShortText and Posit were found to be more accurate than others. In 
, the authors used Twitter APIs to collect the data related to the COVID-19 pandemic from 1 March 2020 to 5 June 2020. A total of 85.04 million tweets were collected from 182 countries. After data collection, the authors examined this data to figure out misleading materials and misinformation based on fast-checking sources. To monitor the nature of misinformation, the analysis is displayed on the publicly available dashboard.
, the authors conducted a study for the identification of those hot topics which were posted on Twitter related to the COVID-19 pandemic. A set of tools, such as Twitter API, PostgreSQL database, and Tweepy Python library, were used to collect the words by searching them with predefined special keywords, such as “Corona”, “COVID-19”, and “virus”. Data were collected from 2 February 2020 to 15 March 2020 to conduct the research. Secondly, the authors utilized an ML algorithm named latent Dirichlet allocation to model the obtained articles. Thirdly a sentiment analysis was also performed to obtain the total number of likes, followers, and retweets for each article. The final result of this study showed that out of 2.8 million total tweets, there were 167,073 tweets posted by 160,829 Twitter unique users. A total of 12 COVID-19-related articles were identified and then these articles were categorized according to the origin, source, impact, and steps on how to combat the virus.
3.3. Impact of Rumors about COVID-19
Rumors and false conspiracies leave a very negative impact on the community in any event. During the COVID-19 pandemic, rumors, and fake conspiracies have also played a very negative role regarding mortality rate, infected patients, and the origin of COVID-19 disease. Many social media users and websites were involved in the circulation of fake news that ultimately became the cause of people’s depression and fear. In 
, 43.3 million English tweets from the United States were analyzed having the hashtag COVID-19. The main purpose of analyzing tweets was to provide early evidence of how bots are used to spread fake and political conspiracies in the United States. For this study, the authors used the combination of ML and manual validation performed by humans to find out accounts that were involved in illegal activities, such as the automation of tweets and working as bots. However, after this, time series analysis techniques were imposed to extract the aim of humans and bots. The final results of this study showed that the bots were mainly used for the promotion of political conspiracies and fake news whereas human users focused and discussed public health and welfare. In 
, a publicly available dataset of tweets published from 21 January 2020 to 31 March 2020. To extract features such as keywords and past tweets, authors used Twitter API and Tweepy in their study. This dataset consisted of 123 million tweets which contain the hashtag of COVID-19 or even the word COVID-19 in the title. However, around 60% of the tweets in this dataset belongs to the English language. This dataset could also be helpful in future to track the misinformation or rumors about COVID-19. In 
, the authors examined the data collected from social media platforms, government agencies, fact-checking websites, and news agencies to figure out the percentage of rumors, stigma, and misinformation related to the COVID-19 pandemic. The authors performed an analysis using the open-source R statistical package while the content analysis was performed for the data collected from online newspapers and articles. After the analysis, the authors reported 2311 reports of misinformation, rumors, and stigma associated with illness, mortality rate, and the vaccine.
The disaster of COVID-19 has caused significant damage the lives of all kinds of people across the globe. However, the circulation of misinformation on social media platforms misguided individuals and created confusion for them. Therefore, in this context, automated techniques are considered vital tools to identify false arguments in order to avert the propagation of misinformation. In 
, the authors offered a technique based on ML and DL algorithms to check the misinformation related to COVID-19. This study proposed the BERT technique (bidirectional encoder representation from transformers) in combination with deep learning models to find out fake news and conspiracies. In the end, the authors evaluated the BERT technique with five other ML and DL techniques and the results showed that the proposed technique outperformed the rest of the techniques by achieving 99.1% accuracy.
3.4. Collection and Publication of Datasets of Social Media
During the COVID-19 outbreak, various researchers worked to collect the datasets of social media applications such as Twitter, Facebook, Instagram and WhatsApp. The main purpose of collecting and publishing these datasets is to provide the facility for other researchers to conduct their research in the future. In 
, a Twitter dataset related to COVID-19 was collected and published for future purposes. The dataset was collected by using Twitter API dated from 22 January 2020 to 13 March 2020. This dataset is based on around 6,468,529 tweets. The main keywords used to search tweets were virus, coronavirus, ncov19, ncov2019, and COVID-19. However, this dataset includes tweets from 66 worldwide languages and the interesting fact was that the English language tweets amount to 63.4% of the total. In addition, the researchers intend to use this dataset to examine how information and misinformation were circulated via Twitter during the early stages of the pandemic.
According to rough estimates, there are hundreds of millions of tweets daily on Twitter. It is a well-known medium for publishing information and news. However, it has played an important role in the duration of seasonal diseases such as influenza 
and even more severe epidemics such as Zika 
, Ebola 
, H1N1 
, as well as the current pandemic, COVID-19. In 
, the ArCOV-19, named as Arabic COVID-19, dataset of Twitter is presented that is collected from 27 January 2020 to 30 April 2020. This is the first Arabic dataset with 1 million tweets along with propagation networks of the most liked and most retweeted tweets. After this study, ArCOV-19 has been published for other researchers to conduct their research. In 
, a dataset of Arabic tweets is collected by using Twitter API and Tweepy library built in Python from 1 January 2020 to 5 April 2020. This dataset contains around 3,934,610 million tweets. However, the dataset contains tweet features such as id of the tweet, username, hashtags used in the tweet and location from where the tweet is published. The main objective of collecting this dataset is to provide benefits to other researchers and policymakers to explore issues related to COVID-19 in future, such as the prevalence of misinformation, trending hashtags, etc.
, an Instagram dataset related to COVID-19 was presented by the authors to help the research community to study the spread of fake and real news on social media platforms. Instagram APIs were utilized to collect the public data available in the form of posts. While the Instagram Hashtag Engine was employed to gather the data linked with COVID-19 hashtags. The collection process started on 5 January 2020 and continued until 30 March 2020. A total of 5.3 thousand public posts, 392 thousand likes, and 18.5k likes were collected. The authors then organized the dataset into four categories named post contents, such as features, comments contents, and publisher information. Later on, the dataset was made public on GitHub for the researcher community. In 
, a Twitter dataset was proposed by the authors. The authors used Twitter APIs to collect data from 28 January 2020 to 1 July 2020. A total of 63 million COVID-19-related posts from 13 million users were collected. The author then utilized natural language processing and the ML algorithm for similar post-detection and tagging the posts with sentiment valence. In 
, a multilingual Twitter Dataset related to COVID-19 was proposed by the authors that is available on GitHub to help the research community. For this task, the authors utilized Twitter’s streaming API along with the Tweepy library to track the words related to COVID-19 keywords. The data collection process for this dataset was started on 28 January 2020. The authors also utilized Twitter’s search API for the collection of tweets posted in the past. The proposed dataset consisted of a total of 123 million tweets. However, the dataset is being updated regularly. In 
, the authors provided a Twitter dataset named (COVID-19-FAKES) aimed to help the research community to detect misinformation and fake news posted on Twitter. The data collection process for this dataset was started on 4 February 2020 to 10 March 2020. The authors utilized the official websites and Twitter accounts of WHO, UN, UNICEF, and fact-checking websites as a source of valid information to develop a ground-truth database. The authors utilized 13 different ML techniques to annotate the proposed dataset. The proposed dataset is made public on GitHub.