Few-shot text classification aims to recognize new classes with only a few labeled text instances. Previous studies mainly utilized text semantic features to model the instance-level relation among partial samples. However, the single relation information makes it difficult for many models to address complicated natural language tasks. A novel hesitant fuzzy graph neural network (HFGNN) model that explores the multi-attribute relations between samples is proposed. HFGNN is combined with the Prototypical Network (HFGNN-Proto) to achieve few-shot text classification.
1. Introduction
In recent years, the great success of deep learning has promoted the development of multitudinous fields such as computer vision and natural language processing
[1][2][3][1,2,3], but the effectiveness of deep learning models relies on a large amount of labeled data. The generalization ability of deep learning models is severely limited when labeled data are scarce. Humans, on the other hand, have the ability to learn quickly and can easily build awareness of new things with just a few examples. This significant gap in learning between machine learning models and humans inspires researchers to explore few-shot learning (FSL)
[4].
Inspired by the human-learning process, researchers proposed a meta-learning strategy for FSL, which utilizes the distribution of similar tasks to learn how to identify unseen classes accurately and efficiently with a small amount of training data. A cross-task meta-learner learns from multiple similar tasks and provides better initialization for unseen classes based on the knowledge acquired from prior experience. One of the typical meta-learning methods is the Prototypical Network
[5], which computes the class prototypical representation of the support set and classifies the query sample to the nearest prototype.
Prototypical Network
[5] and its variants
[6][7][8][9][6,7,8,9] have been widely used for few-shot text classification tasks. Different from the Prototypical Network that computes class prototypes and query sample embeddings separately, MLMAN
[6] interactively encodes text based on the matching information between the query and support sets at the local and instance levels. Gao et al.
[7] proposed a hybrid attention-based prototypical network that employs instance-level attention and feature-level attention to highlight important instances and features, respectively. Sun et al.
[8] improved the Prototypical Network by using feature level, word level, and instance level multi cross attention. Geng et al.
[9] proposed the Induction Network that induces a better class-level representation using a dynamic routing algorithm
[10]. Although the above methods have considered the intra-class similarity of the support set and the relations between support and query, they ignore inter-class dissimilarity and the relations between query samples. Furthermore, these methods only measure instance-level relations while neglecting other substantive relations. Due to the complexity and diversity of textual forms, simple relations have difficulty describing the true connection between texts and even introduce additional noise to the model.
Instead,
researchwe
rs measure the commonalities and differences among all samples in the task from multiple aspects. Inspired by the Distribution Propagation Graph Neural Network
[11],
reswe
archers introduce the distribution-level information of one sample to all other support samples into few-shot text classification. To better explore relations information, a dual-graph structure consisting of instance graphs and distribution graphs is adopted in HFGNN. The instance graph models instance-level relations and directional relations based on instance features. The distribution graph aggregates instance-level relations to model distribution-level relations and the distance relations. The relations in
researchers'our model represents the similarity between samples at multiple levels. However, similarity is a fuzzy concept, which is not clearly defined. To solve the problem of multiple fuzzy relations,
researcherswe introduce HFS
[12] theory, which can handle multi-attribute decision-making problems well for the HFGNN model.
RWe
searchers design the membership functions corresponding to the multi-attribute relations for the comprehensive evaluation to avoid the loss of relations information. In addition,
researcherswe use a linear function that can further enhance stronger relations and weaken weaker relations to help the model generate more reasonable graph structures and provide inductive biases for the enhancement of instance features. Finally, a prototypical network takes the instance features generated by HFGNN as input and quickly classifies query samples. HFGNN-Proto adopts an episodic strategy
[13] for meta-training in an end-to-end manner. It has a stronger generalization ability and can adapt to new classes without retraining.
2. Few Shot Learning
2. Related Work
2.1. Few Shot Learning
Early studies mainly applied fine-tuning
[14] and data augmentation
[15] to alleviate the overfitting problem caused by insufficient training data but achieved unsatisfactory results. In the meta-learning strategy
[16][17][18][19][16,17,18,19], transferable knowledge that can guide the learning of models are extracted from various tasks, so that the model has the ability of learning to learn. Current meta-learning methods mainly include optimization-based methods
[18][19][20][18,19,20] and metric learning
[5][13][21][5,13,21]. Some representative few-shot learning methods and their corresponding descriptions are listed in
Table 1.
Table 1.
Representative few-shot learning methods and their descriptions.
Type |
Core Idea |
Method |
Description |
Table 2.
Some of the current GNN models for few-shot tasks and their descriptions.
Model |
Description |
Optimization-based Methods |
This type of approach aims to learn to optimize the model parameters given the gradients computed from the few-shot examples. |
Simple GNN [32] |
Garcia et al. | MAML [18] |
MAML trains with a set of initialization parameters, and on the basis of the initial parameters, one or more steps of gradient adjustment can achieve the purpose of quickly adapting the model to new tasks with only a small amount of data. |
[ | 32] constructed a graph model in which the query and all support samples are closely connected and used a node-focused GNN to transfer instance-level relations and label information. |
SNAIL [20] |
A novel combination of temporal convolution and soft attention to learn the optimal optimization strategy. |
ATAML [22][34] |
BERT-PAIR |
TPN [33] |
This method further considers the relations among query samples. |
ATAML facilitates task-agnostic representation learning through task-agnostic parameterization and enhances the adaptability of the model to specific tasks through attention mechanism. |
It adopts an edge-labeling framework to explicitly model the intra-class similarity and inter-class dissimilarity of samples, and dynamically update node and edge features to achieve complex information interactions. |
Metric Learning |
In metric-based methods, instances are mapped into the feature space, the distance between the query and support sets are measured, and the classification is completed using the nearest neighbors concept. |
Siamese Network [23] |
Siamese network contains two parallel neural networks that are trained to extract pair-wise sample features, and the Euclidean distance between features are measured. |
Matching Network [13] |
It generates a weighted K-nearest neighbor classifier based on the cosine distance between sample features. |
Relation Network [21] |
Different from the Siamese network and Matching network which adopt a single and fixed metric, Relation network compares relations with a nonlinear metric learned by a neural network. |
MsKPRN [24] |
MsKPRN extends the Relation Network to be position-aware and integrates multi-scale features. |
MSFN [25] |
MSFN learns a multi-scale feature space and similarities between the multi-scale and class representation are computed. |
Adaptive Metric Learning Model [26] |
Yu et al. [26] proposed an adaptive metric learning model that is able to automatically determine the best weighted combination for emerging few-shot tasks from a set of metrics obtained by meta-learning. |
Knowledge-Guided Relation Network [ | [ | 28] |
BERT-PAIR combines query and each support sample into a sequence and utilizes BERT [29] to predict whether each pair expresses the same class. |
LsSAML [30] |
It utilizes the information implied by class labels to assist pretrained language models extracting more discriminative features. |
SALNet [31] |
This method trains a classifier from labeled data through an attention mechanism and collects lexicons containing important words for each category, and then uses new data labeled by the combination of classifiers and lexicons to guide the learning of the classifier. |
2.2. Graph Neural Network
Graph neural networks (GNNs) were originally designed to process graph-structured data. GNNs can efficiently handle data structures containing complex relations and discover potential connections between data with the ability to transform and aggregate neighbors. Some of the GNN models for few-shot tasks are listed in
Table 2.
EGNN |
27 |
] |
Sui et al. |
[ |
27 |
] |
proposed a knowledge-guided metric model that uses external knowledge to imitate human knowledge and generate relational networks that can apply different metrics to different tasks. |
Other Methods |
There is no unified core idea for these methods. They solve few-shot tasks in different ways, but all achieve competitive results. |
However, these models are all designed for image classification tasks, and these methods only transfer instance-level relations in GNNs, which make it difficult to handle elusive NLP tasks. In contrast, the HFGNN model proposed in this
enst
rudy considers relations between samples from multiple perspectives, and the accurate and sufficient relations help the model construct more discriminative features.
3. Multi-Criteria Decision-Making
2.3. Multi-Criteria Decision-Making
Zadeh
[35] proposed fuzzy set theory to address problems related to fuzzy, subjective, and imprecise judgments. However, this theory lacks the ability to solve the problem of multi-criteria decision-making(MCDM). In this regard, Torra
[12] proposed HFS, which determines the corresponding evaluation index and membership function according to the different attributes of the elements in the universe. HFS is a powerful tool for solving problems involving many uncertainties.
In recent years, some more efficient MCDM methods have been proposed. Deveci et al.
[36] explored a novel approach that integrates Combined Compromise Solution (CoCoSo) with the context of type-2 neutrosophic numbers to overcome the challenging decision process in Urban freight transportation tasks. Pamucar et al.
[37] developed a novel integrated decision-making model which is based on Measuring Attractiveness by a Categorical Based Evaluation TecHnique (MACBETH) for calculating the criteria weights and Weight Aggregated Sum Product ASsessment (WASPAS) methods under the fuzzy environment with Dombi norms.
Considering the operating efficiency of the graph neural network model and the simplicity and effectiveness of the HFS theory,
reswe
archers introduce the HFS theory instead of other complex MCDM methods into the dual graph neural networks to fuse the relations between few-shot examples.