Few-shot text classification aims to recognize new classes with only a few labeled text instances. Previous studies mainly utilized text semantic features to model the instance-level relation among partial samples. However, the single relation information makes it difficult for many models to address complicated natural language tasks. A novel hesitant fuzzy graph neural network (HFGNN) model that explores the multi-attribute relations between samples is proposed. HFGNN is combined with the Prototypical Network (HFGNN-Proto) to achieve few-shot text classification.
1. Introduction
In recent years, the great success of deep learning has promoted the development of multitudinous fields such as computer vision and natural language processing
[1][2][3], but the effectiveness of deep learning models relies on a large amount of labeled data. The generalization ability of deep learning models is severely limited when labeled data are scarce. Humans, on the other hand, have the ability to learn quickly and can easily build awareness of new things with just a few examples. This significant gap in learning between machine learning models and humans inspires researchers to explore few-shot learning (FSL)
[4].
Inspired by the human-learning process, researchers proposed a meta-learning strategy for FSL, which utilizes the distribution of similar tasks to learn how to identify unseen classes accurately and efficiently with a small amount of training data. A cross-task meta-learner learns from multiple similar tasks and provides better initialization for unseen classes based on the knowledge acquired from prior experience. One of the typical meta-learning methods is the Prototypical Network
[5], which computes the class prototypical representation of the support set and classifies the query sample to the nearest prototype.
Prototypical Network
[5] and its variants
[6][7][8][9] have been widely used for few-shot text classification tasks. Different from the Prototypical Network that computes class prototypes and query sample embeddings separately, MLMAN
[6] interactively encodes text based on the matching information between the query and support sets at the local and instance levels. Gao et al.
[7] proposed a hybrid attention-based prototypical network that employs instance-level attention and feature-level attention to highlight important instances and features, respectively. Sun et al.
[8] improved the Prototypical Network by using feature level, word level, and instance level multi cross attention. Geng et al.
[9] proposed the Induction Network that induces a better class-level representation using a dynamic routing algorithm
[10]. Although the above methods have considered the intra-class similarity of the support set and the relations between support and query, they ignore inter-class dissimilarity and the relations between query samples. Furthermore, these methods only measure instance-level relations while neglecting other substantive relations. Due to the complexity and diversity of textual forms, simple relations have difficulty describing the true connection between texts and even introduce additional noise to the model.
Instead, researchers measure the commonalities and differences among all samples in the task from multiple aspects. Inspired by the Distribution Propagation Graph Neural Network
[11], researchers introduce the distribution-level information of one sample to all other support samples into few-shot text classification. To better explore relations information, a dual-graph structure consisting of instance graphs and distribution graphs is adopted in HFGNN. The instance graph models instance-level relations and directional relations based on instance features. The distribution graph aggregates instance-level relations to model distribution-level relations and the distance relations. The relations in researchers' model represents the similarity between samples at multiple levels. However, similarity is a fuzzy concept, which is not clearly defined. To solve the problem of multiple fuzzy relations, researchers introduce HFS
[12] theory, which can handle multi-attribute decision-making problems well for the HFGNN model. Researchers design the membership functions corresponding to the multi-attribute relations for the comprehensive evaluation to avoid the loss of relations information. In addition, researchers use a linear function that can further enhance stronger relations and weaken weaker relations to help the model generate more reasonable graph structures and provide inductive biases for the enhancement of instance features. Finally, a prototypical network takes the instance features generated by HFGNN as input and quickly classifies query samples. HFGNN-Proto adopts an episodic strategy
[13] for meta-training in an end-to-end manner. It has a stronger generalization ability and can adapt to new classes without retraining.
2. Few Shot Learning
Early studies mainly applied fine-tuning
[14] and data augmentation
[15] to alleviate the overfitting problem caused by insufficient training data but achieved unsatisfactory results. In the meta-learning strategy
[16][17][18][19], transferable knowledge that can guide the learning of models are extracted from various tasks, so that the model has the ability of learning to learn. Current meta-learning methods mainly include optimization-based methods
[18][19][20] and metric learning
[5][13][21]. Some representative few-shot learning methods and their corresponding descriptions are listed in
Table 1.
Table 1. Representative few-shot learning methods and their descriptions.
2.2. Graph Neural Network
Graph neural networks (GNNs) were originally designed to process graph-structured data. GNNs can efficiently handle data structures containing complex relations and discover potential connections between data with the ability to transform and aggregate neighbors. Some of the GNN models for few-shot tasks are listed in Table 2.
Table 2. Some of the current GNN models for few-shot tasks and their descriptions.
However, these models are all designed for image classification tasks, and these methods only transfer instance-level relations in GNNs, which make it difficult to handle elusive NLP tasks. In contrast, the HFGNN model proposed in this entry considers relations between samples from multiple perspectives, and the accurate and sufficient relations help the model construct more discriminative features.
3. Multi-Criteria Decision-Making
Zadeh
[35] proposed fuzzy set theory to address problems related to fuzzy, subjective, and imprecise judgments. However, this theory lacks the ability to solve the problem of multi-criteria decision-making(MCDM). In this regard, Torra
[12] proposed HFS, which determines the corresponding evaluation index and membership function according to the different attributes of the elements in the universe. HFS is a powerful tool for solving problems involving many uncertainties.
In recent years, some more efficient MCDM methods have been proposed. Deveci et al.
[36] explored a novel approach that integrates Combined Compromise Solution (CoCoSo) with the context of type-2 neutrosophic numbers to overcome the challenging decision process in Urban freight transportation tasks. Pamucar et al.
[37] developed a novel integrated decision-making model which is based on Measuring Attractiveness by a Categorical Based Evaluation TecHnique (MACBETH) for calculating the criteria weights and Weight Aggregated Sum Product ASsessment (WASPAS) methods under the fuzzy environment with Dombi norms.
Considering the operating efficiency of the graph neural network model and the simplicity and effectiveness of the HFS theory, researchers introduce the HFS theory instead of other complex MCDM methods into the dual graph neural networks to fuse the relations between few-shot examples.