Image processing techniques are used in agriculture to detect diseased leaf, stem, and fruit, to quantify the affected area by disease, to estimate or evaluate the productivity, among others, to find the shape of the affected area, to count or calculate the number of fruits entering in a sorting machine, to determine the color of the disease affected area, and to determine the size and shape of fruits.
2. Vision Systems for Fruit and Vegetable Classification
Video streaming and image processing are useful tasks in classification systems. However, these tasks are computationally expensive in terms of time and computational resources. Video streaming provides information about the environment or gives useful visual features in visual quality inspection. Image processing techniques use these visual features as input to classification or clustering algorithms. Many IoT applications, such as surveillance video, healthcare, face recognition, human activities understanding, and farming, use video sensors
[13][14]. Some approaches use low-cost and low-power machine vision systems
[13]. However, most of these embedded video processing platforms exhibit low performance in real-time classification tasks due to their low computational power and bandwidth. In this case, GPU-based or FPGA-based approaches are suitable.
Machine vision-based fruit sorting systems are capable of replacing labor work for the inspection of fruit size. Seema et al.
[15] reviewed fruit grading and classification systems. The authors summarize the most used features to identify the degree of rotting and ripening, the kind of fruits, and the machine-learning (ML) models used by the reviewed algorithms. They found two approaches: the first is multiple fruit identification systems focused on fruit differentiation, but the fruit quality is discarded. The training of these systems requires thousands of images of a series of different fruits. The second one, the specific fruit classification system, uses large image sets of a single fruit type to train and test the sorter. Although the first approach is more general, the second one is more suitable for single-type fruit sorting machines.
Concerning multiple fruit recognition approaches, Blasco et al. in 2003
[7] proposed a system to estimate the quality of oranges, peaches, and apples using four attributes: size, color, stem location, and detection of external blemishes. The proposed segmentation is based on Bayesian discriminant analysis, performing the correlation of fruit color using the colorimetric index values. The authors tested the classification system with apples, obtaining a blemish detection accuracy of 86% and size accuracy of 93%. Seng and Mirisaee
[16] proposed an image retrieval method that combines classification models obtained from three features: color-based, shape-based, and size-based features to increase the accuracy of recognition. The proposed system uses the nearest neighbors classification to recognize 15 different fruits from their feature values, obtaining an accuracy of 90%. Jana et al.
[12] proposed a system that preprocesses images to separate the fruit in the foreground from the background. Their system extracts texture features from the Gray-level Co-occurrence Matrix (GLCM) and statistical color features from the segmented image. The system creates a single feature descriptor from the extracted features and trains a Support Vector Machine (SVM) classification model. The generated model predicts the category for an unlabeled image from the validation set. The proposed method obtains an 83.33% overall accuracy. De Goma et al.
[11] proposed a system to recognize fruits regardless using the K-nearest neighbor clustering based on statistical values of the color moments, GLCM features, and area by pixels for the size and shape roundness. They used a dataset with 15 different categories with 2633 images, obtaining an 81.94% accuracy.
Concerning orange fruit classification systems, Subramaniam and Balasubramanian
[17] used parallel computing techniques on a multi-core processor to grade citrus fruits. They used the Task Parallel Library to add parallelism and concurrency to applications. They extracted geometrical features such as diameter, perimeter, area, and circularity under a laboratory-simulated real-time condition without a suitable conveyor. The system demonstrated the ability to estimate the diameter of the fruit with 98% accuracy. Sirisathitkul et al.
[18] proposed an image processing technique to perform Chokun orange maturity sorting. In the training step, they captured images of 90 Chokun oranges of three different degrees of maturity with a color digital camera under normal illumination conditions. They performed an RGB to HSV color transformation for each image, using the hue colors to generate a set of decision rules. They tested the proposed model using 50 Chokun orange samples, obtaining a 98% accuracy. Chen et al.
[19] proposed an orange sorting detection by obtaining four main features of the oranges, including fruit surface color, size, surface defect, and shape using image processing. They trained a BackPropagation neural network with these features. They report a sorting accuracy of 94.38%. Peter et al.
[20] proposed an automatic system for disease identification in infected fruits images. The approach is evaluated on three diseases of the navel orange fruits, namely Citrus canker, Citrus melanose, and Citrus black spot, achieving 93% accuracy using global color histogram, local binary patterns, and Halarick texture features. Patel et al. (2019)
[21] reported a system for orange sorting and detecting the bacteria spot defect based on four features: shape, size, color, and texture. They evaluated the SVM classification, obtaining a 67.74% overall accuracy. Behera et al.
[22] proposed a system to grade oranges and identify deformities. They used a multi-class SVM with K-means clustering to classify orange diseases with an accuracy of 90%, and they used fuzzy logic to compute the degree of disease severity. Ifmalinda and Putri
[23] proposed an orange sorting program based on diameter and skin color. They used diameter and RGB index to generate a set of rules to classify oranges, obtaining an overall accuracy of 87%. Wang et al.
[24] proposed an algorithm to predict the sugar content of citrus fruits and performed a classification of the sugar content using light in the visible spectrum. Similar approaches for sorting apples can be found in
[5][9][10][25]; for tomatoes in
[20][26]; for sorting watermelons in
[27]; for palm oil fruit sorting in
[28]; and dates in
[29].
Related to high-performance implementation using FPGA, there are few works. Martínez-Usó et al. in 2005
[8] proposed an unsupervised segmentation algorithm based on a multi-resolution applied to multi-spectral images of fruits as a quality assessment application. Lyu et al.
[30] proposed a citrus flower recognition model based on YOLOv4-Tiny lightweight neural network using software and hardware co-design patterns. They generated the dynamic link library and integrated it into the FPGA-based embedded platform. The recognition accuracy of the citrus flower recognition model deployed on the embedded platform for flowers and buds was not less than 89.30%, and the frame rate was not lower than 16 FPS.
Zhenman et al. proposed an analytical model to compare FPGAs and GPUs performance. FPGAs can provide comparable performance or even achieve better performance than a GPU while consuming an average of 28% of the power required by a GPU for most Rodinia Kernels. Even when FPGAs use a lower clock frequency than GPUs, the FPGA usually achieves a higher number of operations per cycle in each computing pipeline due to its small pipeline initiation interval and considerable pipeline depth
[31]. Zhang et al. proposed an FPGA acceleration of the generalized sparse matrix–matrix multiplication, an essential computing kernel for many algorithms in artificial intelligence
[32]. They evaluated a Huffman tree scheduler on 20 real-world benchmarks, finding that the energy efficiency and performance are increased by 6× and 4×, respectively. Qasaimeh et al. assessed the energy efficiency of CPU, GPU, and FPGA implementation of computer vision kernels. They benchmarked algorithms for all the computer visions based on the OpenVX standard of GPU and FPGA platforms. Many simple seeds implemented on GPUs obtain a 1.1–3.2× energy/frame reduction. Still, the FPGA outperforms GPUs when complex ones that require a complete vision pipeline are necessary by obtaining a 1.2–22.3× energy/frame reduction
[33]. Guo et al. performed a state-of-the-art review of neural network accelerator designs. They concluded that FPGAs achieve more than 10× better speed and energy efficiency than state-of-the-art GPU
[34]. Sanaullah and Herbordt evaluated the hardware implementation of 3D Fast Fourier Transforms (FFTs) using OpenCL as Hardware Description Language. Their performance achieves an average speedup of 29× versus the current CPU and 4.1× versus the recent GPU
[35]. Fowers et al. compared the performance and energy of sliding window applications when implemented on FPGAs, GPUs, and multicore devices. They concluded that FPGAS provides a significant performance increase in most cases, with speedups up to 11× and 57× compared with GPUs and multicores
[36].
Recently, there have been efforts to use deep learning as an effective technique for fruit sorting. In
[4], the authors propose a real-time visual inspection system for sorting fruits using a classification model obtained from state-of-the-art deep-learning convolutional networks. They test their system using apples and bananas. During real-time testing, the system obtained an accuracy of 96.7% for apples and 93.8% for bananas. For the training stage, they used a database composed of 8791 apples and 300 bananas of both healthy and defective fruits. Kukreja and Dhiman, in 2020
[37], proposed a dense CNN algorithm to detect the apparent defects of citrus fruit. They generated a first model without preprocessing and data augmentation on 150 images, achieving an accuracy of 67%. In a second model, the applied data augmentation and preprocessing after the model generation using 1200 images attained an accuracy of 89.1%. Sa et al. in 2016
[38] proposed an approach to fruit detection using deep convolutional neural networks, with application to automated harvesting using a robotic platform, completing fruit detection using imagery obtained from two modalities: color (RGB) and near-infrared (NIR). They compute both precision and recall performances, improving from 80.7% to 83.8% for the detection of sweet peppers. They created a model to detect seven fruits, which took four hours to annotate and train the new model per fruit. Leelavathy et al., in 2021
[39], proposed a CNN-based orange fruit image using a binary cross-entropy loss function, obtaining an overall accuracy of 78.57%. Hossain et al., in 2019
[40], proposed a framework based on two different deep learning architectures. The first is a proposed light model of six convolutional neural network layers, while the second is a fine-tuned visual geometry group-16 pre-trained deep learning model. They used two color-image datasets to evaluate their proposed framework. The first dataset contains clear fruit images, while the second dataset contains fruit images with noise, illumination, and pose variations, which are much harder to classify. Classification accuracies of 99.49% and 99.75% were achieved on dataset 1 for the first and second models, respectively. On dataset 2, the first and second models obtained accuracies of 85.43% and 96.75%, respectively.
Recently, existing solutions have used deep learning approaches to classify defects in fruits. In
[40], the authors propose a system that classified orange images based on fresh and rotten using a CNN, with SoftMax classifier, using 800 orange images, achieving an accuracy of 78.57%. In
[2], the authors generated a dataset of eight different classes of date fruits and compared several CNN models, such as AlexNet, VGG16, InceptionV3, ResNet, and MobileNetV2; MobileNetV2 architecture achieved an accuracy of 99%. In
[41], the authors present a deep-learning system for multi-class fruit and vegetable categorization based on an improved YOLOv4 model that first recognizes the object type in an image before classifying it into one of two categories: fresh or rotten. Compared with the previous YOLO series, the proposed method obtained higher average precision than the original YOLOv4 and YOLOv3, with 50.4%, 49.3%, and 41.7%, respectively. In
[42], the authors proposed an automatic image annotation to classify the ripeness of oil palm fruit and recognize a variety of fruits, trained with 100 images of oil fruit palm and 400 images of various fruits. From the previous systems, not many focus on classifying citrus fruits by color or size but focus specifically on fruit defects.