Agricultural robots play a crucial role in ensuring the sustainability of agriculture. Fruit detection is an essential part of orange-harvesting robot design. Ripe oranges need to be detected accurately in an orchard so they can be successfully picked. Accurate fruit detection in the orchard is significantly hindered by the challenges posed by the illumination and occlusion of fruit. Hence, it is important to detect fruit in a dynamic environment based on real-time data.
1. Introduction
Detection systems play a crucial role in the design of fruit-harvesting robots. The robot should be able to detect the fruit on the tree and locate it properly. Inaccurate fruit detection can greatly reduce the robot’s success rate. Fruit localization and detection has been proven to be one of the most challenging tasks for the fruit-harvesting robots. The most common problem faced in robot fruit detection is variations in illumination and occlusion of fruit on the trees. Illumination variation makes the fruit appear differently in the day and at night. It can be difficult to view fruit when dense leaves overlap. Further, fruit clusters make recognition difficult. Hence, robotic perception can pose many difficulties, especially in complex environments like fruit orchards. It is noted that in most previous studies, the datasets used do not depict the real-time data. In most datasets, single fruit images, augmented using various techniques, are used to train the data. However, in reality, the fruit on the tree often displays variations in illumination, occlusion of fruits, as well as fruit overlapping with various branches and leaves. The lack of real-time data for training often leads to more false detection.
Computer vision is commonly used in the design of fruit-harvesting robots. RGB-D cameras have proved to be of great significance in improving vision sensors, by providing the substantial and improved information needed for image recognition and localization. Recently, machine-learning algorithms have improved the performance of fruit detection, recognition, and pose estimation using an end-effector. Machine-learning algorithms use deep-learning to train and test the data based on a dataset of fruit. A sequential model is created with layers and an assigned weightage to train and test the data for recognition of the assigned fruit.
2. Common CNN Models Used for Fruit Detection
The use of computer vision has yielded remarkable results, especially in the field of agriculture. Fruit detection is one area that has seen immense improvement. Earlier research used methods of computer vision that involved manual extraction of the features of the fruit and used machine learning to classify image features. Further, earlier fruits were identified based on a single feature, like colour, shape, or texture, using classification algorithms. Image preprocessing was carried out along with the use of a classifier. Recently, advancements in deep-learning methods saw beneficial use of, and implementation in, the field of object detection. The ability of convolutional neural networks to abstract useful features from an input image makes it a better choice for fruit detection. However, the preparation and availability of a diversified dataset can be challenging. Many researchers have used deep-learning methods for the detection of different fruits. CNN models are commonly used for fruit and plant detection in the agricultural sector
[1]. Common CNN models used for fruit detection include Resnet, AlexNet, GoogLeNet and VGG. Other deep-learning models like YOLO and SSD have also been used to detect objects.
Many CNN models have been tested for detecting various fruits. Apples have been tested using VGG
[2] with an F1 score of 0.791; AlexNet with 11 layers
[3] was tested with an accuracy of 92.5% and detection was also carried out using two CNN models
[4]. Strawberries have been detected using ResNet, with a detection accuracy of 95.78%
[5] and 94%
[6]. Kiwis were detected using VGG, with a harvesting success of 51%
[7] and a detection rate of 90.7%
[8]. Similarly, sweet peppers were detected using Resnet
[9] and multiple fruits were detected using VGG with an F1 score ranging from 0.791 to 0.94
[10]. Tomatoes were detected using ResNet with a detection accuracy up to 93%
[11], whereas oranges were detected using ResNet with an accuracy of 97.5%
[12]. Other fruits detected using CNN models include grapes
[13], papayas
[14], lemons
[15], bananas
[16], melons
[17], cucumbers
[18], and others.
Fruit was also detected by researchers using YOLO and SSD models. These detect images in a single algorithm run and performed faster than the models mentioned above. Detection of oranges
[19], cherries
[20], apples
[21], and others showed promising results using YOLO. Furthermore, fruits like mangoes
[22], pineapples
[23], and others were also detected using the SSD method. While the latter models are considered faster, the CNN models mentioned above, like ResNet, are far more accurate.
Fruit detection using CNN models has been implemented in various environments. Much fruit detection is carried out in a controlled environment like laboratories or greenhouses. The detection of tomatoes
[24[24][25],
25], cucumbers, and other fruits is carried out in a greenhouse; however, some fruit detection is carried out in fields
[10,26,27][10][26][27]. Fruit detection is easier in a controlled environment such as a greenhouse. Detecting fruit in an orchard presents more of a challenge as the environment, illumination, and other factors play a major role in the true performance of the model. Hence, fruit detection in a complex real-time environment is encouraged to depict the true accuracy of the model. Thus, offline training and online testing should be adopted to obtain more reliable results.
In previous works on fruit detection using CNN models, many researchers obtained datasets from image repositories like Fruit-360, COCO, OpenImages, AGROSEG dataset, etc. These datasets contain images of single or multiple fruits placed on a table or a structured environment. Sakib et al.
[25] used Fruit-360 as a dataset for fruit detection, whereas Duong et al.
[28] used ImageNet as the dataset for their research. Further, Padhila et al.
[26] used the OpenImages dataset for tomato detection. Very limited studies have been conducted on real-time datasets, such as Sa et al.
[10], who used a customized dataset with the network initialized by a pretrained ImageNet dataset. Similarly, Ganesh et al.
[12] also used a customized dataset from orchard in Citra, FL, USA. In addition, Vasconez et al.
[27] used real-time pictures of apples from orchards in California as fruit-detection datasets. Hence, there is a need for a more diversified dataset that includes images of fruit in real-time scenarios, not limited to a particular orchard, but using images of several orchards from around the world. This will make the dataset more universal as different orchard images may vary by topography, weather, and environment.
Similarly, many other researchers have used various datasets for fruit detection, taken from the internet. Most of the above datasets have images of a single fruit or fruits taken in a controlled environment. In reality, the fruit being detected is in an orchard, with varying illumination, fruit occlusion, and overlaps of leaves and fruit. Hence, it is important to have a dataset based on real-time orchard images. There are very few studies using real-time images. Real-time datasets can help improve the model’s performance and can improve its robustness.
Furthermore, most CNN models for fruit detection are trained, verified, and tested offline. This means the process of training and testing the CNN model using a dataset is stored on a local machine or a remote server, rather than using live data from a camera or other sensor in real-time. Another approach uses offline training and online testing. This process refers to training a model using a dataset stored on a local machine or a remote server, then using it to make predictions on new, unseen data in real-time using a camera on an agricultural machine. There are very limited data available for online testing in a real-time system.