Challenges in Agricultural Image Datasets and Filter Algorithms

Challenges in Agricultural Image Datasets and Filter Algorithms: Comparison

Please note this is a comparison between Version 2 by Jessie Wu and Version 3 by Jessie Wu.

Smart farming is facilitated by remote sensing because it allows for the inexpensive monitoring of crops, crop classification, stress detection yield forecasting using lightweight sensors over a wide area in a relatively short amount of time. Deep learning (DL)-based computer vision is one of the important aspects of the automatic detection and monitoring of plant stress. Challenges for DL algorithms in the agricultural dataset include size variation in objects, image resolution, background clutter, precise annotation with the expert, high object density or the demand for different spectral images.

pansharpening
filtering
unmanned aerial vehicle (UAV) images
deep learning

1. Challenges in Agricultural Image Datasets

Machine Learning (ML) models have a great potential for various agricultural applications, e.g., for plant stress detection. However, these agricultural applications produce their own specific challenges, which require image preprocessing. For instance, in the case of disease detection in plant stands, different diseases have almost identical spectral features due to disease stages and environmental influences, making digital image processing difficult. It can challenge even experts in visual differentiation ^[1][26]. Likewise, computer vision algorithms encounter difficulties in detecting these diseases ^[2][27]. In the case of weed detection, weeds with the same leaf colour as the plant to be protected, spectral features could lead to a low performance in detecting the weeds ^[3][28].

In general, large image datasets are required to train robust ML and deep learning (DL) models. Insufficient amount of training images could cause an overfitting issue, leading to a reduction of the model’s generalization capability ^[4][29]. Careful annotations of the available image datasets are an indispensable image preprocessing step for agricultural use cases. Disturbing backgrounds scenes and objects, such as soil and other biomass, create problems in target image annotation for visible images in automated disease and weed detection and ^[5][6][30,31]. Noise, blurring, brightness, and contrast issues can degrade the image quality, where image noise results from the interaction of natural light and camera mechanics ^[7][32]. Due to the speed of unmanned aerial vehicles (UAVs), captured images could be impacted by motion blur and excessive brightness, posing a significant challenge for classification and object detection ^[8][33]. However, low-priced sensors produce low-quality images for UAVs at high altitudes. UAV propellers create shadows and blurring in low-altitude photographs ^[9][34]. Clouds and their shadows are significant obstacles for space-borne sensors, which interfere with identifying plants and their diseases ^[10][35]. Low-light conditions diminish the image’s clarity and uniformity, resulting in low contrast and a distorted focal point, which is a significant issue in object detection ^[11][36].

Multispectral images comprise more bands than RGB images, where each band represents different data characteristics (red, near-infrared) of the same scene. Hence, band-to-band registration is needed to merge the information from the individual bands. However, proper image alignment for registration is a well-known issue in the image registration task ^[12][37]. Another issue with multispectral images is that they often lack sufficient spatial resolution, which is undesirable for subsequent image-processing tasks. The fusion of multispectral images with high spatial resolution panchromatic (PAN) images, also called pansharpening, can result in improved images in terms of both spectral and spatial details ^[13][38]. Pansharpened images can improve the accuracy in object detection and image classification based on deep neural networks as, e.g., demonstrated in ^[14][15][39,40]. On the other hand, Xu et al. ^[16][23] also found that most of the pansharpening algorithms suffered from distortions regarding color and spatial details. Moreover, misregistration and object size differences also resulted in poor and blurry pansharpened images. As a result, the object detection algorithm struggled with the spatially distorted pansharpened images ^[17][22].

2. Filter Algorithms

Due to the abnormality in the capturing process, sensor issues, and environmental conditions, images can suffer degradation through blurring, noise, geometric distortions, inadequate illumination, and lack of sharpness ^[18][41]. Some high- and low-pass image filtering algorithms are available for various image preprocessing tasks, such as image denoising, image enhancement, image deblurring, histogram equalization, and contrast correction. Image noise is undesired information that degrades the visual quality of the images, which can be happened by various factors, such as data acquisition, signal transmission, and computational errors ^[19][42]. Usually, images are corrupted with additive noise, such as salt and pepper, Gaussian, Poisson noise or multiplicative noise, such as speckle noise ^[20][43]. Salt and pepper noise arises from the sudden changes in the image acquisition process, for example; dust particles or over heated components, which arises black and white dots ^[21][44]. According to the name, the Gaussian noise is a noise distribution that follows the Gaussian distribution. In the noisy image, each pixel is the sum of the true pixel and a random, normally distributed noise value ^[22][45]. Poisson noise occurs when the amount of photons detected by the sensor is not enough to provide measurable statistical information ^[23][46]. The speckle noise is a common phenomena in the coherent imaging system, for example; laser, Synthetic Aperture Radar (SAR), ultrasonic, and acoustics images, where the noisy image is the product of the obtained signal and the speckle noise ^[24][47].

To start with denoising, Gaussian and Wiener denoising, median and bilateral filters are standard filters which eliminate unwanted noises from images ^[25][48]. The median filter is a nonlinear denoising filter that removes salt-and-pepper noise and softens edges. The bilateral filter has several applications for denoising tasks due to its property of preserving edges ^[26][27][49,50]. On the other hand, the Wiener filter performs well to denoise the Speckle and Gaussian noisy images ^[28][51]. Archana and Sahayadhas ^[29][52] stated that the Wiener filter outperformed the Gaussian and mean filter for the case of noise removal in images of paddy leaves.

Image blurring is the unsharp area of images, caused by camera movement, shaking, and lack of focusing, which is classified into average blur, Gaussian blur, motion blur, defocus blur, and atmospheric turbulence blur ^[30][53], which is a bottleneck for the high-quality of images and responsible for corrupting important image information ^[31][54]. Image deblurring filters are the inverse techniques, which aim to restore the sharpness of the images from the degraded images ^[31][54]. The blur algorithms are broadly classified into two main categories: blind and non-blind, depending on the availability of the Point Spread Function (PSF) information ^[32][55]. Among them, Wiener is one of most common non-blind image restoration technique of degraded image by motion blur, unfocused optic blur, noise, and linear blur ^[33][56]. According to Al-Ameen et al. ^[34][57], the Laplacian sharpening filter performs well with Gaussian blur but poorly with noisy images. On the other hand, with a more significant number of iterations, the optimized Richardson-Lucy algorithm is the more stable option for blurry and noisy images ^[34][57]. In case of plant disease diagnosis, edge sharpening filters can highlight the pixels around the border of the region and thereby improved image segmentation ^[35][58]. On the other hand, maximum likelihood-based image deblurring algorithms do not require the PSF information, hence, they are an effective tool for blind image deblurring ^[36][59]. Yi and Shimamura ^[37][60] developed an improved maximum likelihood-based blind image restoration technique for degraded images affected by noise and blur. Moreover, in blind-image restoration techniques, unsharp masking is a classic technique to restore the blurry and noisy images and subsequently enhance the details and improve the edge-information ^[38][61].

Edge detection filters are essential for extracting the edges of the discontinuities. For the task of image enhancement, Histogram Equalization (HE) is a standard method for transforming a darkened image into a clearer one. It stretches the image’s dynamic range by flattening the histogram to transform lower contrast areas into more distinct contrast ^[39][62]. Moreover, it can improve the accuracy of automatic leaf disease detection as has been shown in ^[40][63]. Adaptive Histogram Equalization (AHE) is an additional algorithm that reduces HE’s limitations by increasing the local contrast of the input images. Contrast Limited Adaptive Histogram Equalization (CLAHE) is an even further advanced form of HE that reduces noise amplification and can further improve the clarity of distorted input images ^[41][42][64,65]. CLAHE has been demonstrated to be able to improve Convolutional Neural Network (CNN) classification accuracy by enhancing images with low contrast, and poor quality ^[43][66].

3. Deep Neural Networks and Generative Adversarial Networks

Deep neural networks have demonstrated a variety of image restoration tasks in recent years, including denoising, deblurring, and super-resolution ^[44][67]. Tian et al. ^[45][68] reviewed the contemporary state-of-the-art networks for image denoising, where the authors concluded that most DL-based denoising networks performed well with additive noise. On the other hand, ground truth is the most critical factor to allow for robust feature learning in DL models. In contrast, images taken in real-world environments usually exhibit inherent noise not added artificially and thus lack a ground truth. Dealing with this issue constitutes a significant research direction in DL-based denoising models. Recent studies have therefore developed self-supervised-based denoisers such as self2self ^[46][69], Neighbor2Neighbor ^[47][70], and Deformed2Self ^[48][71].

Generative Adversarial Networks (GANs) have been considered a breakthrough in the DL domain focussing on computer vision applications ^[49][72]. Since their creation, GANs have been used in various computer vision tasks such as image preprocessing, super-resolution, and image fusion ^[50][73]. Recently, a GAN model known as hierarchical Generative Adversarial Network (HI-GAN) ^[51][74] has been developed to address the aforementioned issue of real-world noisy images. Unlike other DCNN-based denoisers, HI-GAN not only maintains a higher Peak Signal-To-Noise Ratio (PSNR) score but also preserves high-frequency details and low-contrast features. As for denoising, deblurring has seen the development of several GAN models in recent years. However, most models require a corresponding pair of blurred and sharp images for training purposes (the ground-truth issue again), which is contradictory to the requirements for training on real-world data having innate noise ^[52][75].

Nevertheless, unsupervised GANs have been developed to fix these issues. Nimisha and Sunil ^[53][76] developed the first self-supervised approach for the unsupervised deblurring of real-world and synthetic images. A self-supervised model for blind model deblurring was later enhanced by Liu et al. ^[54][77], while a self-supervised model for event-based real-world blurred photos was developed by Xu et al. ^[48][71]. Li et al. ^[55][78] designed a self-supervised You Only Look Yourself (YOLY) model that can enhance images without using ground truth and any prior training, reducing the time and effort required for data collection.

In computer vision, deciphering low-resolution images represents a major hurdle in object detection and classification tasks, because the resolution is not sufficient for disease recognition ^[9][34]. This challenge is tackled by the invention of image super-resolution techniques. Dong et al. ^[56][79] presented SRCNN, the first CNN-based lightweight Single Image Super-Resolution (SISR) approach that performed better than the previous sparse-coding-based super-resolution model. As for deblurring and denoising, unsupervised and self-supervised DL strategies have been devised to approach this image upsampling issue for real-world images ^[57][80].

In remote sensing, simultaneously receiving images from multiple sensors is common, including panchromatic images providing high spatial resolution and lower-resolution multispectral images delivering the valuable spectral data. Higher spatial and spectral resolution images can be achieved by fusing images captured simultaneously by individual multispectral and panchromatic sensors, which is called pansharpening ^[58][81]. Broadly, pansharpening methods can be categorized into five main groups: those based on Component Substitution (CS), Multi-Resolution Analysis (MRA), Variational Optimization-based (VO) approaches, hybrid and DL-based methods ^[59][60][82,83]. According to Javan et al. ^[60][83], Multi-Resolution Analysis (MRA) had a higher spectral quality. At the same time, the hybrid method performed better with the spatial quality, and Component Substitution (CS)-based performed the least in maintaining both spectral and spatial quality. Nonetheless, both CS and MRA-based pansharpening methods produce distorted images in spatial and spectral dimensions due to misregistration. DL models have been shown to resolve this issue ^[61][84]. CNN- and GAN-based pansharpening models, see ^[62][85] for a recent review, produce a more stable spatial and spectral balance obtaining high correlation to the original multispectral bands.

Considering the current literature on the application of the aforementioned DL-based technology to agricultural image analysis use cases reveals a strong and expanding body of research. Image Super Resolution (SR) has demonstrated a higher classification accuracy in plant disease detection. For instance, SR images by Super-Resolution Generative Adversarial Networks (SRGAN) gained higher classification accuracy for wheat stripe rust classification ^[63][86]. Similarly, Wider-activation for Attention-mechanism based on a Generative Adversarial Network (WAGAN) has led to the higher classification accuracy of tomato diseases via LR images ^[64][87]. Furthermore, the employment of a SR approach through a Residual Skip Network-based method for enhancing imagery resolution, as demonstrated in grape plant leaf disease detection ^[65][88], and the integration of dual-attention and topology-fusion mechanisms within a Generative Adversarial Network (DATFGAN) for agricultural image analysis ^[66][89], have collectively contributed to improved classification accuracy.

Despite the limited number of studies conducted on DL-based deblurring and motion deblurring in the context of agricultural images, the available research indicates noteworthy advancements in crop image classification accuracy. Shah and Kumar ^[67][90] utilized DeblurGANv2 ^[68][91] to fix the motion blur issue in grape detection, which significantly improved the classification accuracy. Correspondingly, WRA-Net—Wide Receptive Field Attention Network, see ^[69][92]—was introduced to deblur the motion-blurred images, which improved the crop weed segmentation accuracy. Moreover, Xiao et al. ^[70][93] introduced a novel hybrid technique, namely SR-DeblurUGAN, encompassing both image deblurring and super-resolution, which gained a stable performance on agricultural drone image enhancement.