Cross-Domain Sentiment Analysis in IoT: Comparison
Please note this is a comparison between Version 1 by Zhongwei Xu and Version 2 by Catherine Yang.

Social media is a real-time social sensor to sense and collect diverse information, which can be combined with sentiment analysis to help IoT sensors provide user-demanded favorable data in smart systems. In the case of insufficient data labels, cross-domain sentiment analysis aims to transfer knowledge from the source domain with rich labels to the target domain that lacks labels. Most domain adaptation sentiment analysis methods achieve transfer learning by reducing the domain differences between the source and target domains, but little attention is paid to the negative transfer problem caused by invalid source domains. 

  • social sensor
  • cross-domain sentiment analysis
  • multi-source selection

1. Introduction

The Internet of Things (IoT) has become a major focus in the IT industry. IoT connects physical objects to the online world, making them virtual intelligent objects based on sensors. To simulate real-world intelligence, millions of physical objects are interconnected through sensor devices. Today, IoT is using big data sentiment analysis to reshape, analyze, and improve integrated information processing systems based on sensors [1]. Sentiment analysis empowers IoT devices to gather more useful data from massive datasets to better understand needs and optimize services, which is an important pillar for positioning and improving IoT technology. As an exemplification, social media are real-world sensors that can be employed to evaluate the pulse of societies by collecting real-time data and information from online human interactions [2][3][2,3]. The social sensor integrated with application programming interfaces for sensing news (i.e., the news sensor) combined sentiment analysis to extract sentiment information from global news and generate an interactive global threat map using geographic data [4]. This system may provide crucial decision support and intelligent early warning, allowing decision-makers to monitor the situation to identify potential hazards and improve area security by monitoring environmental conditions in real time through sensors connected to the IoT [5]. Furthermore, sentiment analysis can be combined with dynamic online user recruitment [6] to understand their sentiment tendencies and engagement levels. This information can be utilized to make decisions regarding user recruitment strategies, selecting users who exhibit a positive willingness to cooperate and higher levels of contribution.
Data from social sensors have different types of themes, and they can be considered as different domains. Currently, the approach used for sentiment analysis in a single domain is usually supervised learning of annotated samples from that domain, but this process is labor-intensive and difficult to adapt to new domains. As shown in Table 1, consider two domains: Book and Electronics. It is clear that they share common features such as “high cost–performance ratio”, while also having their own specific functional descriptions. Therefore, different domains often exhibit both shared and private features. Accordingly, the aim of cross-domain analysis is to utilize a generalized method that mines knowledge shared across domains with rich sentiment labels. This knowledge can then be used for sentiment classification in domains with few or no sentiment labels.
Table 1.
Reviews from Book and Electronics domains.
The fast development of IoT has significantly contributed to the promotion of sentiment analysis due to the integration of big data, cloud computing, and 5G [7]. Cross-domain sentiment analysis has attracted the attention of many scholars. For unsupervised cross-domain sentiment analysis, one solution is to continuously reduce the domain differences between the target and source via domain adaptation. Another solution is to assign weights to pre-trained source domain classifiers based on the relationship between the target and source. From the first perspective, Remus et al. [8] proposed to select samples from the source domain that were most similar to the target domain, which employed bag-of-word models for vectorization and measured similarity through the Jaccard Similarity (JS) Distance. Further introducing neural-based models, Liu et al. [9] argued that adversarial training can extract purer shared features for multi-domain text classification, which could enhance the shared feature space that only contains common and task-invariant information, without mixing unnecessary task-specific features or feature redundancy. Chen et al. [10] introduced a polynomial adversarial network that learned invariant features by reducing the differences between each domain feature distribution, which was the same as the model proposed by Liu et al. [9] when using the negative log-likelihood (NLL) loss. Moreover, Dai et al. [11] determined the source domain closest to the target domain by minimizing the 𝒜-distance between domains.

2. Domain Adaptation

Domain adaptation is an important part of transfer learning [12][16], which aims to map data from different source and target domains into a common feature space so that they are as similar as possible. Dredze et al. [13][17] suggested that finding a suitable domain adaptation approach would be challenging if the labeling criteria differed between domains, which is critical for domain adaptation. The source domain might be single or multiple. In single-source domain adaptation, particular emphasis is placed on overcoming distribution mismatch and domain shift difficulties. Ghifary et al. [14][18] utilized the maximum mean discrepancy (MMD) metric as a regularization between different domains to alleviate the distribution mismatch. With the application of deep networks, Rozantsev et al. [15][19] argued that the weights of the corresponding layers between the source and target domains should not be shared but associated with weight regularizers, which could automatically determine whether weights are shared or not. Xue et al. [16][20] introduced deep mutual learning by utilizing two groups of label probers with the same structure as sentiment classifiers, enabling the interaction of sentiment information between different groups. In multi-source domain adaptation, more consideration should be given to how multiple source domains can be combined. Guo et al. [17][21] combined results from multi-source domains using a point-to-set distance and introduced meta-training to learn it. To increase effective knowledge sharing between source domains, Zhao et al. [18][22] utilized soft parameter sharing to capture sentiment representations across domains and obtained shared representations for the target by fine-tuning. Dai et al. [19][13] directly obtained the classification results of the target instances through the source domain classifier, and the weight assigned to each source domain was acquired by the domain discriminator. On this basis, Li et al. [20][23] used weighted private features from each source domain to strengthen the learning of private features in the target domain. Furthermore, Zhang et al. [21][24] found that the more similar domain features are, the more relevant instances are. Thus, feature similarity can more accurately reflect relationship information than the domain discriminator [19][13]. In theour model, the classification labels of the target instances are still directly obtained through the source domain classifier, but the weights assigned to source domains are set by the similarity between the target instances and source domains.

3. Attention Mechanism

Attention mechanisms were first developed in the field of computer vision and were introduced into the field of natural language processing through machine translation [22][25] tasks. The wide range of applications of attention mechanisms in sentiment analysis tasks can effectively improve the classification efficiency of the model [23][26]. For example, Ji et al. [24][27] designed a bifurcated long short-term memory (LSTM) network using attention-based LSTM, which can extract topic and sentiment features from the source domains. Gan et al. [25][28] proposed a convolutional neural network (CNN)-BiLSTM model with an attention mechanism that included global and local attention, which enhanced feature differentiation. In addition, Basiri et al. [26][29] proposed an attention-based bidirectional CNN-recurrent neural network (RNN) depth model that applied an attention mechanism to the outputs of the bidirectional layers, allowing for varying levels of emphasis on different words. Dai et al. [27][30] introduced a sentence-level-based attention transfer network to address the issue of insufficient utilization of the semantic information within the sentences of a document. However, these methods applied attention mechanisms for finding key words or sentences and ignored transferable features from different source domains. The attention weights are represented by feature similarity, which can determine the importance of different source domains. Zheng et al. [28][29][31,32] demonstrated the powerful performance of Bidirectional Encoder Representations from Transformers (BERT), a pre-trained model with an attention mechanism at its core, in language representation.

4. Adversarial Training

Generative adversarial networks were proposed by Goodfellow et al. [30][33], who used the idea of a two-person zero-sum game to achieve an optimal equilibrium in the training. The idea of adversarial training has also been introduced for multi-domain sentiment analysis tasks. Ganin et al. [31][34] proposed an adversarial training process using gradient reversal layers, and the model was widely used in later studies. Adversarial training is employed to train a feature extractor that maps both the source and target domains into a shared feature space in supervised learning, allowing the classifier learned on the source data to be transferred to the target domain. For example, Ganin et al. [32][35] and Zhao et al. [33][36] both used adversarial neural networks to extract domain-invariant features. Adversarial training can also be applied to unlabeled data. Adversarial training between the classifier and the feature extractor can enhance the feature extraction capabilities for previously unseen features and strengthen the robustness of the classifier. Wu et al. [34][37] proposed a dual adversarial cooperative learning method that extracted domain-invariant features and ensured alignment between labeled and unlabeled data in each domain. Wu et al. [35][38] adopted standard adversarial training to learn domain-invariant features, and virtual adversarial training with entropy minimization to optimize the prediction inconsistency for unlabeled data. However, using adversarial training on unlabeled data may not fully capture the true features, due to the absence of label information, making the model’s performance susceptible to variations.
Video Production Service