People’s preferences form the public’s collective sentiment and various political elements including elections, representation, and policymaking. Public opinion is a group expression or consensus of people who share the same or similar interests (MacDougall, 1952). Naturally, distinguishing interest groups in the realm of politics is challenging. People can have more than one preference and be part of multiple interest groups simultaneously. Until today, polling has been the dominant method to assess public opinion. Specifically, presidential approval is a good example of a polling estimate to measure public opinion in politics.
1. Measuring Public Opinion in Politics
Public opinion refers to the ideas, thoughts, expressions, interests, or beliefs of particular people who are part of broader society [1][2][3]. The researchers aim to understand what people think. Polls have made it possible to represent the public’s aggregated attitude and have added value to politics by providing technical and organized information to the public, politicians, and researchers [1][4][5].
Polls measure the public opinion of a target population. In his work, MacDougall (1952) clearly explained the boundaries of public opinion. Geographical distinctions define the scope of a public, which means multiple publics can exist in the world. Recent years have seen more spatial separations as many different services have become available in cyberspace. A person can have various interests and participate in different interest groups, which, according to MacDougall [1], are equal to many participating publics. Difference in thoughts is another reason why the public is not unique. Polling or surveys are the dominant methods to gauge public opinion of the mass despite their shortcomings [6][7]. Berinsky [6] emphasizes that political scientists must be cautious about their choice of sample and the questions to be asked, indicating the difficulties in creating a suitable sample for a target population and extracting meaningful results through a proper question. Koo [7] specifies the difficulty of having a representative sample in a political environment in South Korea. His study demonstrates that young female voters are under-represented in samples for an election prediction. Both authors highlight the transformative influence of mobile phones and the Internet on people’s lifestyles as a reason for inaccurate samples.
Presidential approval has been the most popular aspect when gauging public opinion in politics. Researchers have studied the subject since John Mueller’s seminal study in 1970. Reviews of presidential approval fall under two main branches: effect and cause. An example of the former would be the influence of presidential approval on the president’s policy proposals [8], public positioning [9], presence in the legislative body [10], and legislation success. The latter branch, meanwhile, considers presidential approval as a dependent variable and examines what influences public opinion. As Mueller wrote in his book titled
1. Measuring Public Opinion in Politics
Public opinion refers to the ideas, thoughts, expressions, interests, or beliefs of particular people who are part of broader society [1,7,8]. The researchers aim to understand what people think. Polls have made it possible to represent the public’s aggregated attitude and have added value to politics by providing technical and organized information to the public, politicians, and researchers [1,9,10].
Polls measure the public opinion of a target population. In his work, MacDougall (1952) clearly explained the boundaries of public opinion. Geographical distinctions define the scope of a public, which means multiple publics can exist in the world. Recent years have seen more spatial separations as many different services have become available in cyberspace. A person can have various interests and participate in different interest groups, which, according to MacDougall [1], are equal to many participating publics. Difference in thoughts is another reason why the public is not unique. Polling or surveys are the dominant methods to gauge public opinion of the mass despite their shortcomings [11,12]. Berinsky [11] emphasizes that political scientists must be cautious about their choice of sample and the questions to be asked, indicating the difficulties in creating a suitable sample for a target population and extracting meaningful results through a proper question. Koo [12] specifies the difficulty of having a representative sample in a political environment in South Korea. His study demonstrates that young female voters are under-represented in samples for an election prediction. Both authors highlight the transformative influence of mobile phones and the Internet on people’s lifestyles as a reason for inaccurate samples.
Presidential approval has been the most popular aspect when gauging public opinion in politics. Researchers have studied the subject since John Mueller’s seminal study in 1970. Reviews of presidential approval fall under two main branches: effect and cause. An example of the former would be the influence of presidential approval on the president’s policy proposals [13], public positioning [14], presence in the legislative body [15], and legislation success. The latter branch, meanwhile, considers presidential approval as a dependent variable and examines what influences public opinion. As Mueller wrote in his book titled War, Presidents, and Public Opinion [11], for example, war is a driving factor for presidential approval. Prolonged war and a high death count, especially among the U.S. military, cause a decline in approval. This factor was confirmed in other studies (Gartner & Segura, 1988; Ostrom & Simon, 1985). In addition, economic conditions significantly affect presidential approval, as many studies have found [12][13][14][15]. [16], for example, war is a driving factor for presidential approval. Prolonged war and a high death count, especially among the U.S. military, cause a decline in approval. This factor was confirmed in other studies (Gartner & Segura, 1988; Ostrom & Simon, 1985). In addition, economic conditions significantly affect presidential approval, as many studies have found [17,18,19,20].
2. Online Public Opinion and Its Methods
This study addresses the problem of an offline survey by measuring the mass opinion available in cyberspace. It explores a way to extract group sentiments using user-generated texts and a deep learning technique. Online public opinion literature has two branches. The first is a group of studies investigating distinctive characteristics of online public opinions. This category explains the extent to which the Internet represents the public. Duggan and Brenner [16] reveal that social network platforms have different user compositions, which leads to a distinctive level of general population representativeness. Moreover, this trait is not specific to the online environment in the United States. Mellon and Prosser [17] argue that British users of Twitter and Facebook share no similarities with the general population; they differ in many factors including age, gender, and education level. Some studies argue that in South Korea, social networks represent a particular group of people rather than the general population [18][19]. In addition, scholars have attempted to analyze the political traits of Twitter users. Cyberspace users demonstrate strong political engagement and partisanship [20][21]. Online services can underrepresent specific groups such as women, as well as certain political ideologies [21][22].
The other branch seeks to interpret political phenomena using online data. The most considerable interest is in predicting election results using social media [23][24][25]. Related studies analyze different signals to calculate the possibility of election result predictions. Another area of interest in politics is the subject of issue saliency. Similar to presidential approval literature, these studies illustrate particular themes influencing election results: election debates [26][27] and economic status [28]. They all explain that these factors can shape elections and presidential approval.
The above studies represent research interest in social media and its influence, yet they do not completely understand the online public’s thoughts. The lack of research here is due to the difficulty in collecting and processing massive volumes of unstructured data. If it were possible to utilize such data, information from the Internet can be a great complement to existing measures of public opinion. There is a constant real-time inflow of information, as people continuously communicate in the online environment.
Online data is fundamentally different from traditional survey data. The former does not follow the existing structure, which consists of a question and an answer [29]. Unlike in a survey, useful information is scattered and hidden under big data, which refer to both a vast amount of data and a multivalent process facilitating the combination of heterogeneous data and the extraction of valuable information for use [30]. Therefore, techniques for handling big data should be different from the ones used in traditional research. There are mainly two approaches to extract the aggregated attitudes of people who use online data: the counting method and sentiment analysis [29]. The first method involves the simple counting of texts with a particular pattern yielding mixed results. Some studies have illustrated the successful prediction of elections [25][31], while others have explained that counting does not reveal much predictive power [32].
The other approach is sentiment analysis, which aims to understand emotion hidden in a text through the use of a computer. The analysis tool takes raw text data, tokenizes texts, and analyzes the processed words [33]. There are supervised and unsupervised learning methods available for sentiment analysis. The supervised method uses training data, which contain predetermined emotions regardless of a subject domain, and eventually builds a model predicting uncategorized text data. Neural networks introduce substantial improvements in natural language processing, which leads to better sentiment classification. Bidirectional Encoder Representations from Transformers (BERT), a pre-trained neural network, exhibits considerably higher performance than other sentiment classification tools [34]. The unsupervised method, meanwhile, utilizes already established lexicon or dictionary and sentiment categories. Many studies on online communication have incorporated unsupervised learning methods [34][35][36][37][38].
2. Online Public Opinion and Its Methods
This study addresses the problem of an offline survey by measuring the mass opinion available in cyberspace. It explores a way to extract group sentiments using user-generated texts and a deep learning technique. Online public opinion literature has two branches. The first is a group of studies investigating distinctive characteristics of online public opinions. This category explains the extent to which the Internet represents the public. Duggan and Brenner [21] reveal that social network platforms have different user compositions, which leads to a distinctive level of general population representativeness. Moreover, this trait is not specific to the online environment in the United States. Mellon and Prosser [22] argue that British users of Twitter and Facebook share no similarities with the general population; they differ in many factors including age, gender, and education level. Some studies argue that in South Korea, social networks represent a particular group of people rather than the general population [23,24]. In addition, scholars have attempted to analyze the political traits of Twitter users. Cyberspace users demonstrate strong political engagement and partisanship [25,26]. Online services can underrepresent specific groups such as women, as well as certain political ideologies [26,27].
The other branch seeks to interpret political phenomena using online data. The most considerable interest is in predicting election results using social media [28,29,30]. Related studies analyze different signals to calculate the possibility of election result predictions. Another area of interest in politics is the subject of issue saliency. Similar to presidential approval literature, these studies illustrate particular themes influencing election results: election debates [31,32] and economic status [33]. They all explain that these factors can shape elections and presidential approval.
The above studies represent research interest in social media and its influence, yet they do not completely understand the online public’s thoughts. The lack of research here is due to the difficulty in collecting and processing massive volumes of unstructured data. If it were possible to utilize such data, information from the Internet can be a great complement to existing measures of public opinion. There is a constant real-time inflow of information, as people continuously communicate in the online environment.
Online data is fundamentally different from traditional survey data. The former does not follow the existing structure, which consists of a question and an answer [4]. Unlike in a survey, useful information is scattered and hidden under big data, which refer to both a vast amount of data and a multivalent process facilitating the combination of heterogeneous data and the extraction of valuable information for use [34]. Therefore, techniques for handling big data should be different from the ones used in traditional research. There are mainly two approaches to extract the aggregated attitudes of people who use online data: the counting method and sentiment analysis [4]. The first method involves the simple counting of texts with a particular pattern yielding mixed results. Some studies have illustrated the successful prediction of elections [30,35], while others have explained that counting does not reveal much predictive power [36].
The other approach is sentiment analysis, which aims to understand emotion hidden in a text through the use of a computer. The analysis tool takes raw text data, tokenizes texts, and analyzes the processed words [37]. There are supervised and unsupervised learning methods available for sentiment analysis. The supervised method uses training data, which contain predetermined emotions regardless of a subject domain, and eventually builds a model predicting uncategorized text data. Neural networks introduce substantial improvements in natural language processing, which leads to better sentiment classification. Bidirectional Encoder Representations from Transformers (BERT), a pre-trained neural network, exhibits considerably higher performance than other sentiment classification tools [38]. The unsupervised method, meanwhile, utilizes already established lexicon or dictionary and sentiment categories. Many studies on online communication have incorporated unsupervised learning methods [38,39,40,41,42].