Strawberry Ripeness Classification: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

Image analysis-based artificial intelligence (AI) models leveraging convolutional neural networks (CNN) take a significant role in evaluating the ripeness of strawberry, contributing to the maximization of productivity. However, the convolution, which constitutes the majority of the CNN models, imposes significant computational burdens. Additionally, the dense operations in the fully connected (FC) layer necessitate a vast number of parameters and entail extensive external memory access. Therefore, reducing the computational burden of convolution operations and alleviating memory overhead is essential in embedded environment.

  • artificial intelligence (AI)
  • convolutional neural network (CNN)
  • strawberry
  • ripeness
  • sensor
  • computer vision technology

1. Introduction

Recently, AI technologies have undergone rapid development, bringing significant changes to the field of agriculture. Research and applications utilizing AI to enhance agricultural productivity and promote sustainable farming practices have emerged diversely [1,2,3]. Within the domain of AI applications, image analysis models assume a critical role in shaping agricultural production and management. Image-based AI models contribute significantly to discerning the state of crops and evaluating the ripeness of fruits, thereby predicting the optimal harvesting time to maximize productivity [4,5,6]. Particularly, accurately measuring the ripeness of fruits in this field is emphasized as a key task in the modern agricultural paradigm to optimize yield and resource utilization. Strawberries, being a representative crop, enable the evaluation of ripeness levels based on color. Strawberries are economically valuable fruits due to their sweet taste and rich nutritional content [7,8]. The flavor and market impact of strawberries are significantly influenced by their degree of ripeness. A more advanced level of ripening results in distinct taste variations and has considerable implications for distribution. For these reasons, evaluating the appropriate ripeness of strawberries is essential as it enhances the marketability and economic viability of the fruit through timely harvesting [9].
To harvest strawberries at the right time, the characteristics of strawberries must be accurately extracted and classified according to their ripeness. Specifically, finding unique patterns or structures within images is crucial for extracting appropriate features from the images. Traditional methods for feature extraction include algorithms such as scale-invariant feature transform (SIFT), speeded-up robust features (SURF), and features from accelerated segment test (FAST). SIFT is an algorithm that detects features in an image that are invariant to scale and rotation [10]. SURF improves upon SIFT, providing faster processing speeds [11]. FAST is a computer vision algorithm that rapidly detects corners in an image, identifying keypoints based on intensity differences between central and neighboring pixels [12]. However, deep learning (DL), which automatically learns features from large amounts of data and is applicable to various complex tasks, has demonstrated higher performance and versatility compared to traditional feature extraction methods.
Among DL models for image analysis, CNN models especially demonstrate excellent performance in extracting features and learning from images, making them effective in analyzing images in complex agricultural environments [13]. The CNN model consists of a combination of a convolution layer, responsible for extracting feature maps, and an FC layer, which performs the final classification based on the extracted feature maps. In the convolution layer, kernels are applied using small windows to perform convolution operations, extracting various feature maps from the input image. During this process, each kernel effectively captures features related to the relationships between pixels within the window, revealing information about local patterns, edges, textures, and more [14]. In the FC layer, a flattened feature vector is typically received as input. In this process, the locally extracted multi-channel feature map data from convolution layer is flattened into a single-channel feature vector. This flattened vector is connected to the neurons in the FC layer, where each neuron corresponds to a specific feature. Based on this connectivity, the model learns global patterns of the input image, leading to the final classification [15].
In CNN models, convolution operations constitute a significant portion of the overall computation. Equation (1) represents the feature map, expressed as the cumulative sum of the product between weights and input vectors. Here, i and j denote the positions of the output feature map, M and N represent the kernel size, and C indicates the number of channels in the input image. The convolution operations, especially in AI processors that demand high performance, require significant computational resources. Therefore, the developments of convolution operation accelerators have emerged as crucial topics in the field [16]. In particular, the CNN accelerator utilizing stochastic computing (SC-CNN) has achieved energy efficiency and performance improvement by introducing probabilistic bit representation and highly parallelized methods [17,18]. Additionally, binary neural networks (BNN) are known for limiting weights and activations to binary values, making them suitable for edge devices with low power consumption and resource requirements. BNN contributes to improving latency and throughput performance of specific hardware models by binarizing the network [19,20].
 
F e a t u r e m a p ( i , j ) = m = 0 M 1 n = 0 N 1 c = 0 C 1 I n p u t ( m , n , c ) K e r n e l ( i m , j n , c )
However, the CNN accelerators primarily emphasize convolution operations, necessitating consideration for operations in the FC layer. As the depth of deep neural networks (DNN) increases, in the FC layer, there are numerous parameters that need to be accessed from external memory to perform extensive weight multiplication operations for millions of parameters [21,22]. Furthermore, the FC layer is to be challenging compared to accelerating convolution layer, especially in achieving high data reuse performance. These requirements lead to issues such as speed degradation due to external memory access and memory overhead arising from the substantial volume of parameters stored in external memory [23]. Especially in embedded systems, limited memory resources, and computational performance become obstacles to performing FC layer operations. Additionally, the massive computational volume of the FC layer hinders efficient power consumption and management in embedded systems. To address these challenges, various methods have been proposed, including techniques based on the utilization of statistics from extracted feature maps to reduce power consumption and approaches employing the k-NN algorithm [24]. Especially, the k-NN algorithm demonstrates robust performance, particularly in small-scale datasets, and is efficient in terms of both memory and computational resources due to its concise and intuitive structure. Furthermore, the k-NN algorithm performs relatively fewer computations compared to the FC layer, which involves multiplication operations with weights for numerous nodes. Therefore, the algorithm notably enhances performance and efficiency, especially on edge devices, where optimizing limited resources is paramount [25].

2. Strawberry Ripeness Classification Systems

With the development of AI, sensor technology, and computer vision technology, research has been conducted to apply these technologies to the agricultural field, such as smart farms, to create high added-value [28,29,30,31,32,33,34,35]. In particular, for the system that determines the ripeness of fruits, many studies have been carried out utilizing AI and computer vision technology, as it is often necessary to judge the ripeness by external factors such as color, shape, etc. [28,29,30].
In [28], a system was proposed to estimate the ripeness of Hass avocados with hyperspectral imaging (HSI) and DL. A camera with a spectral resolution of 1.3 nm was employed to obtain hyperspectral images, and a total of 551 avocado images were obtained. The 44,096 sub-images obtained from the avocado images were used for training and testing the CNN to determine the number of days left until the avocado ripens, and an average of 1.17 days of root mean squared error (RMSE) was obtained. In [29], a system capable of object detection of Cavendish bananas and Carabao mangoes with the you only look once (YOLO) algorithm and determining the ripeness of the fruits with a support vector machine (SVM) is shown. With 33 test images, the system’s overall accuracy regarding its classification of ripeness to the fruit subjects is 90.9%. Meanwhile, in addition to research on systems employing AI and computer vision technology, there are also studies using new sensors. In [30], a tactile-based sensor was used to determine the ripeness of fruit. The ripeness was estimated by developing a low-cost, lithography-free, highly sensitive and flexible capacitive tactile sensor to measure the firmness and stiffness of the fruit, and it was confirmed that the capacitance of the sensor varies depending on the stiffness of tomatoes.
Strawberries have been widely employed in many studies as a fruit for ripeness detection research based on computer vision technology due to their distinct color changes depending on the degree of ripeness [31,32,33,34,35]. The authors in [31] utilized the YOLOv7 model for object detection and ripeness measurement of strawberries. They also visualized the location and ripeness with augmented reality (AR) technology. The ripeness of strawberries was divided into ‘Unripe’, ‘Partially Ripe’, and ‘Ripe’, and the performance of the model was improved by preventing overfitting through data augmentation. A total of 2900 images were set for training and validation, and the accuracy of ripeness prediction had an F1 score of 0.92. In [32], a two-path model that learns the coordinates of the stalk while learning the ripeness utilizing semantic segmentation was proposed. The model was designed based on VGG16, and the highest accuracy was 90.33%. In [33], a method of measuring strawberry ripeness by extracting RGB features and utilizing the k-NN algorithm was proposed. The authors measured the ripeness using the k-NN algorithm by inputting representative values calculated from features such as component values, area features, roundness, slenderness, etc., extracted from the RGB data of the image. As a result of training with a total of 30 images and testing with 20 images, it showed a ripeness prediction accuracy of 85%. In [34], a multiclass SVM with a radial basis function (RBF) kernel was utilized. The proposed system consisted of an image resizing process to reduce execution time and memory usage, RGB to HSV converting, feature extraction represented by the average value of RGB, and classification with the SVM. The maximum accuracy of the system was 85.64%. In [35], a ripeness detection system that combines semantic graphics for data annotation with a fully convolutional neural network (FCN) made up of an encoder and an expanding decoder was proposed, showing an average ripeness prediction accuracy of 88.57%. Table 1 displays the comparison of related works regarding strawberry ripeness estimation mentioned above.
Table 1. The comparison of related works in strawberry ripeness prediction.
Source Proposed Approach Accuracy Pros Cons
[31] (1) Combination of YOLOv7 and AR to detect and visualize the ripeness of strawberries. (2) Multiscale training of YOLO to overcome challenges in detecting small objects. F1 score of 0.92 It has proven superior performance compared to other state-of-the-art methodologies. It is not suitable for embedded environments because there are many computing elements.
[32] (1) The ripeness of strawberries is detected with a semantic segmentation model. (2) To reduce parameters of the model, a two-path convolution method was proposed. 90.33% The two-path convolution method effectively reduced the number of parameters. The amount of strawberry image data is small, and the diversity is insufficient.
[33] (1) RGB feature extraction: to obtain the color information of strawberry images, the paper calculates the mean and standard deviation of each color component. (2) The k-NN algorithm was used to classify the strawberry images into ripeness categories. 85% The amount of computation is low compared to other methods. RGB features alone cannot completely distinguish the degree of ripening of strawberries.
[34] (1) SVM with RBF kernel function is used to classify the ripeness. (2) A prototype of the strawberry classification system was constructed with real-time video. 85.64% SVM is a simple and lightweight classification method. The number of categories that can be classified with RBF kernel is limited.
[35] (1) Semantic graphics for data annotation: use simple graphic elements to display the quantity and ripening of strawberries. (2) FCN is utilized to recognize and learn semitangible graphics. 88.57% Desired targets can be efficiently tagged with semantic graphics. The external characteristics of strawberries are not fully represented.

This entry is adapted from the peer-reviewed paper 10.3390/electronics13020344

This entry is offline, you can click here to edit this entry!
Video Production Service