ProMatch: Semi-Supervised Learning with Prototype Consistency

ProMatch: Semi-Supervised Learning with Prototype Consistency: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Computer Science, Artificial Intelligence

Contributor:

Semi-supervised learning (SSL) methods have made significant advancements by combining consistency-regularization and pseudo-labeling in a joint learning paradigm. The core concept of these methods is to identify consistency targets (pseudo-labels) by selecting predicted distributions with high confidence from weakly augmented unlabeled samples.

semi-supervised
pseudo-label
prototype consistency

1. Introduction

In the past few decades, machine leaning has demonstrated remarkable success across various visual tasks [1,2,3,4,5,6,7,8]. This success can be attributed to advancements in learning algorithms and the availability of extensive labeled datasets. However, in real-world scenarios, the construction of large labeled datasets can be costly and often impractical. Therefore, finding ways to effectively learn from a limited number of labeled data points has become a major concern. This is where semi-supervised learning (SSL) [9,10,11] comes into play. SSL is an important branch of the machine learning theory and its algorithms, which has emerged as a promising solution to address this challenge by leveraging the abundance of unlabeled data. It has proven to be a remarkable achievement in the field.

The goal of SSL is to enhance the generalization performance by leveraging the potential of unlabeled data. One widely accepted assumption, known as the Low-density Separation Assumption [12], posits that the decision boundary should typically reside in low-density regions in order to improve generalization. Building upon this assumption, two prominent paradigms have emerged: pseudo-labeling [11] and consistency regularization [13]. These approaches have gained significant popularity in the field as effective methods for leveraging unlabeled data in the pursuit of a better generalization performance. Consistency-regularization based methods have become widely adopted methods in SSL. These methods aim to maintain the stability of network outputs when presented with noisy inputs [13,14]. However, one limitation of consistency-regularization based methods is their heavy reliance on extensive data augmentations, which may restrict their effectiveness in certain domains such as videos and medical images. On the other hand, pseudo-labeling based methods are alternative approaches that have gained popularity in SSL. These methods select unlabeled samples with high confidence as training targets (pseudo-labels) [11]. One notable advantage of pseudo-labeling based methods is their simplicity, as they do not require multiple data augmentations and can be easily applied to various domains.

In recent trends, a combination of pseudo-labeling and consistency regularization has shown promising results [15,16,17,18]. The underlying idea of these methods is to train a classifier using labeled samples and use the predicted distribution as pseudo-labels for unlabeled samples. These pseudo-labels are typically generated by weakly augmented views [16,19], or by averaging predictions from multiple strongly augmented views [9]. The objective is then constructed by applying the cross-entropy loss between the pseudo-labels and the predictions obtained from different strongly augmented views. It is worth noting that the pseudo-labels are often sharpened or processed using argmax, and each instance is assigned to a specific category to further refine the learning process.

2. Consistency Regularization

Consistency regularization is a commonly used technique in machine learning to improve the generalization ability and stability of models. Specifically, it often employs input perturbation techniques [10,22]. For example, in image classification, it is common to elastically deform or add noise to an input image, which can dramatically change the pixel content of an image without altering its label. In other words, it can artificially expand the size of a training set by generating a near-infinite stream of new, modified data. Up to now, many methods based on the pseudo-label have been proposed. For instance, in [22], it increases the variability of the data by incorporating stochastic transformations and perturbations in deep semi-supervised learning. Simultaneously, it minimizes the discrepancy between the predictions of unlabeled samples and their true labels. Temporal Ensembling [23] meets the consistency requirements by minimizing the mean square difference between the predicted probability distributions of the two data-augmented views. Mean Teacher [10] further extends this concept by replacing the aggregated predictions with the output of an exponential moving average (EMA) model. In VAT [24], consistency regularization is implemented through the introduction of virtual adversarial loss. It perturbs samples in the input space with a small noise and maximizes the adversarial loss, forcing the model to generate consistent predictions for these perturbed samples. To encapsulate, in semi-supervised learning, a classifier should output the same class distribution for an unlabeled example, whether it was augmented or not. For unlabeled points x, in the simplest case, it is achieved by adding a regularization term to the loss function as follows:

Note that Augment(x) is a stochastic transformation, so the two terms in Equation (1) are not identical. VAT [24] computes an additive perturbation to apply to the input, which maximally changes the output class distribution. MixMatch [9] utilizes a form of consistency regularization through the use of standard data augmentation for images (random horizontal flips and crops). FixMatch [19] distinguishes between two degrees of data augmentation (weak and strong): weak augmentation uses standard data augmentation, and strong augmentation may include greater random clipping, rotation, scaling, affine transformation, etc.

3. Pseudo-Labeling

Pseudo-labels are artificial labels generated by the model itself; they aid the model in learning more robust and generalized representations. However, one should be cautious when using pseudo-labels, as prediction results may include errors or uncertainty, potentially introducing noise. Among pseudo-labeling-based approaches, such as [11], they conduct entropy minimization implicitly by constructing hard (one-hot) labels from high-confidence predictions on unlabeled samples. TSSDL [25] introduces confidence scores which are determined based on the density of a local neighborhood surrounding each unlabeled sample to measure the reliability of pseudo-labels. In [26], it involves training a teacher model on labeled data to generate pseudo-labels for unlabeled data. Then, a noisy student model is trained using the unlabeled data with pseudo-labels. R2-D2 [27] attempts to update the pseudo-labels through an optimization framework. It generates pseudo-labels by the decipher model during the repetitive prediction process on unlabeled data. In summation, appropriate measures can ensure the reliability of the pseudo-labeled data. It encourages us to focus on the sample with a high confidence (low entropy) that is away from the decision boundaries.

4. The Combination of Consistency Regularization and Pseudo-Labeling

Some methods [28,29,30,31] propose to integrate both approaches in a unified framework, which is often called the holistic approach. As one of the pioneering works, FixMatch [19] first generates a pseudo-label from the model’s prediction on the weakly-augmented instance and then encourages the prediction from the strongly-augmented instance to follow the pseudo-label. Their success inspired many variants that use, e.g., curriculum learning [30,31]. FlexMatch [30] dynamically adjusts the pre-defined threshold in a class-specific manner based on the estimated learning status of each class, which is determined by the number of confident unlabeled data samples. Dash [31] dynamically selects the unlabeled data whose loss value does not exceed a dynamic threshold at each optimization step to train learning models. These methods show high accuracy, comparable to supervised learning in a fully-labeled setting.

This entry is adapted from the peer-reviewed paper 10.3390/math11163537

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.