Machine Learning in Agricultural Big Data

Machine Learning in Agricultural Big Data: Comparison

Please note this is a comparison between Version 1 by Ania Cravero and Version 2 by Nora Tang.

Agricultural Big Data is a set of technologies that allows responding to the challenges of the new data era. In conjunction with machine learning, farmers can use data to address problems such as farmers’ decision making, water management, soil management, crop management, and livestock management. Crop management includes yield prediction, disease detection, weed detection, crop quality, and species recognition. On the other hand, livestock management considers animal welfare and livestock production.

Big Data
machine learning
agriculture

1. Machine Learning

ML is a research field that formally focuses on learning systems and algorithm theory, performance, and properties. It is a highly interdisciplinary field based on different areas such as artificial intelligence, optimization theory, information theory, statistics, cognitive science, optimum control, and many other scientific, engineering, and mathematical disciplines ^[1][13]. Because of its many applications, ML has covered almost every scientific domain, making it significantly impact science and society ^[2][14]. It is applied to recommendation drivers, recognition systems, informatics and data mining, and autonomous control systems ^[3][15].

Depending on the nature of the feedback available for a learning system, ML can be classified into three main types: supervised learning, unsupervised learning, and reinforced learning.

Briefly, supervised learning and unsupervised learning mainly focus on data analysis, while reinforced learning is preferred for decision-making problems.

In general, the goal of ML algorithms is to optimize the performance of a task by exploiting examples or past experiences. By exploiting examples or past experiences, ML can generate efficient relationships for data inputs and reconstruct a knowledge schema to analyze large data volumes ^[4][16].

On the contrary, deep learning (DL) is a branch of ML that tries to model abstractions with a series of algorithms by using a deep layer with multiple processing layers. DL, which is of great interest in the artificial intelligence field, has come to the fore in natural language processing and image classification ^[5][17]

DL has algorithms such as convolutional neural networks (CNN), recurrent neural networks (RNN), restricted Boltzmann machine (RBM), and deep belief network (DBN). Furthermore, DL has the advantages of processing unstructured data at maximum capacity, producing high-quality results, and avoiding unnecessary costs.

ML has been used to solve different agricultural problems in crop management, including yield prediction, disease detection, weed detection, crop quality, and species recognition; in livestock management, including animal welfare and livestock production; in water management; and in soil management ^[6][4][5][9,16,17].

An example of this is that many producers say that weeds are the most severe threat to crop production. Accurate weed detection is essential for sustainable agriculture because weeds are difficult to detect and distinguish from crops. ML algorithms, along with sensors, now allow accurate detection and identification of weeds without causing environmental problems or secondary effects. ML for weed detection has led to developing tools and robots to destroy weeds, minimizing the need for herbicides ^[6][9]. Therefore, accurate detection and classification of the characteristics of crop quality have increased product values and reduced waste.

On the other hand, Liakos et al. point out that when data recordings are involved, occasionally at the level of Big Data, the implementations of ML are less in number, mainly because of the increased efforts required for the data analysis task, not for the ML models per se. This fact partially explains the almost equal distribution of ML applications in livestock management (19%), water management (10%), and soil management (10%) ^[6][9].

It is observed that the ML ANN algorithm is still the preferred algorithm for data analysis. On the other hand, Ensemble learning has been gaining ground and outperforms other algorithms such as SVM and decision trees (DS). According to Benos et al., the most commonly used data come from meteorology, soil, water and crop quality, remote sensing, satellite imagery, UAVs and UAVs, and in situ and laboratory measurements ^[4][16]. The most frequent ML model providing the best output was, by far, ANN, which appeared in almost half of the reviewed studies (51.8%). RNN followed, representing approximately 10% of ANNs, with long short-term memory standing out, as it can optimize it. The second most accurate ML model was ensemble learning (EL), contributing to the ML models used in agricultural systems with approximately 22.2%, and regression models came next with an equal percentage, namely 4.7%. Both of these ML models were presented in all generic categories.

Benos et al. conclude that the increasing interest in ML analyses in agricultural applications is captured. When comparing the number of relevant studies, between 2018 and 2019, there was an increase of 26%. For 2020, the corresponding increase jumped to 109% against 2019 findings; thus, resulting in an overall 164% rise compared with 2018. The accelerating rate of the research interest in ML in agriculture is a consequence of various factors, following the considerable advancements of ICT systems in agriculture ^[4][16].

The increased interest in ML research in agriculture is a consequence of several factors: the considerable advances in ICT systems in agriculture; the vital need to increase the efficiency of agricultural practices while reducing the environmental burden; and the need for reliable measurements with the handling of large volumes of data ^[4][5][16,17].

2. Big Data

Big Data is defined in four dimensions (four Vs) ^[7][18]. First, it refers to the enormous volume of generated, stored, and processed data. Second, it also refers to the high velocity of data transmission in interactions, and the rates at which data are generated, collected, and exchanged. Thirdly, it refers to the variety of data formats and structures (structured, semi-structured, and unstructured) resulting from the heterogeneity of data sources ^[8][19]. The fourth dimension is veracity, which refers to the ability to validate the quality of the data used in the analyses.

Apart from the “4 Vs”, another dimension of Big Data must also be considered: its value. The value is obtained by analyzing data to extract hidden patterns, trends, and knowledge models through algorithms and smart data analysis techniques. Data science methods increase the value of data by better understanding their phenomena and behaviors, optimizing processes, and improving the discoveries of machines, businesses, and scientists ^[9][20].

In practice, Big Data analysis tools enable data scientists to discover correlations and patterns by analyzing massive quantities of data from different sources. In recent years, the science of Big Data has become an essential modern discipline for data analysis ^[10][21]. It is considered an amalgam of classic disciplines such as statistics, artificial intelligence, mathematics, and informatics with its sub-disciplines, including database systems, ML, and distributed systems ^[11][22].

The Big Data ecosystem handles the evolution of data, models, and support infrastructure throughout its life cycle; it is a whole set of components, or architecture, storing, processing, and visualizing data and delivering results to guide applications ^[12][13][23,24].

The Big Data process starts with the identification of the sources from which useful data are extracted ^[7][18]. Next, the data are stored in one of the designed data models depending on whether the data are structured or not. In the following step, the data are classified and filtered according to the type of analysis required. Then, it is defined whether the processing will be by batch, stream, or memory storage ^[14][25]. The classified data are analyzed using appropriate tools such as DL ^[15][26], ad hoc analysis ^[14][25], and data science in general ^[16][27]. The data obtained must be presented through some kind of visualization tool. Finally, the data are analyzed by the decision makers ^[13][24].

Big Data in agriculture refers to all the modern technology available combined with data analysis as a foundation for making decisions only based on data ^[17][28].

Big Data has been used to improve various aspects of agriculture, such as knowledge about weather and climate change, land, animal research, crops, soil, weeds, food availability and security, biodiversity, farmers’ decision making, farmers’ insurance and finance, and remote sensing ^[18][29]. It is also used to create platforms that allow the supply chain actors to have access to high-quality products and processes, tools to improve yields and predict demand, and advice and guidance to farmers based on the response capacity of their crops to fertilizers leading to better fertilizer use. Furthermore, Big Data has led to the introduction of plant-scanning equipment used to follow up on deliveries and allow retailers to monitor consumer purchases by improving product traceability throughout the supply chain ^[19][30].

Big Data does not function in isolation. It has been used with other technologies such as ML, cloud-based platforms, image processing, modeling and simulation, statistical analysis, NDVI vegetation indices, and geographic information systems (GIS) ^[18][29]. ML tools have been used in prediction, grouping, and classification problems, while image processing has been used when the data are extracted from images (i.e., cameras and remote sensing) ^[18][29].

3. Challenges in Agricultural Big Data and ML

Several authors have explained a number challenges when using Big Data or ML in data analysis for agricultural development.

White et al. conducted a survey with researchers participating in a conference on precision farming to identify different scenarios and challenges where agricultural Big Data is used: (1) mid-season yield prediction for real-time decision making, (2) sow lameness, (3) irrigation in cotton management, (4) in-season decision making, (5) policymaker perspective, (6) cropping selection system, (7) business analytics for agriculture, (8) grower perspective, (9) consumer perspective, and (10) benchmarking scenario—comparing individual grower yields with modeled outputs based on other people’s data ^[20][2]. The challenges indicated for these scenarios using the data are errors, inaccessibility, unusability, incompatibility, and inconvenience. An example of this is the lack of data interoperability that prevents integration and unified analysis of data collected by multiple sensors and platforms. The lack of rural bandwidth often makes data transmission, particularly of large data sets that include images, impossible. In addition, sensor data require calibration. Finally, the authors indicated that better representations of crop growth models are required and more specific weather forecasts for individual farms and fields ^[20][2].

Lassoued et al. analyzed the impact and potential of Big Data in agriculture. They identified several challenges related to data sources because not all the segments in the value chain capture data the same way. They pointed out that there is no standard by which the data are captured, making it difficult to harmonize and compile the data from various sources ^[21][7]. Additionally, by doing a survey, they learned that the implementation of Big Data in an organization depends on a clear strategy and a need for trained personnel to administer large volumes of data. Training and talent, more than capital, are fundamental for optimal production in the future ^[21][7]. Another major obstacle identified is data governance. Although most of the experts surveyed were willing to share their data under certain conditions, many expressed concerns about data privacy, security, cybercrime, and intellectual property protection.

Bhat and Huang conducted a study on the application of artificial intelligence and Big Data in agriculture. They indicated several challenges when applying Big Data in real life. One of these challenges is the compilation and analysis of large volumes of data produced through IoT and wireless sensor networks. These two include digital images and data from UAV, satellites, and data integration and pose difficulties for the effective execution of smart farming. The authors explained that most Big Data systems are adequate for large industrial farms because they have the infrastructure to access data, resources, and, most importantly, funding. However, they found only a few examples of small farming operations in the developing world. Big Data has the potential to support non-industrial farms; however, the moral and ethical questions concerning availability, cost, and financing must be addressed to achieve these advantages ^[22][3].

On the other hand, Bhat and Huang examined data collection and analysis challenges. The combination of data from various sources causes concern about the quality of the information and its merging. Moreover, the volume of information compiled causes concern about security and protection. The compiled data sets are enormous and complex, making it challenging to manage the standard procedures of smart analysis. These methods do not usually work well when applied to agricultural data. The authors expect scalable and versatile methods to adapt to large amounts of information ^[22][3].

Since the agricultural data set contains various information about soil, climate, seeds, cultivation practices, irrigation facilities, fertilizers, pesticides, weeds, harvesting, post-harvest techniques, and others, challenges arise at different stages of agricultural Big Data such as at data collection, storage, and analysis ^[23][4]. Moreover, the data are generated and maintained by governments, universities, research organizations, farming companies, and agricultural input companies for agricultural production, insurance, marketing, supply chain, packaging, distribution, etc. ^[23][4]. Due to this multimodal nature of the data, there are several challenges, such as the need to improve data collection methods, statistical techniques, and more effective and efficient data analytics to understand and support the functions of several agricultural verticals. On the other hand, Weersink et al. explained that the data must be collected consistently and fulfill the protocols that can group them into centralized servers. These servers must be protected from cyberattacks while masking the identity of the operation managers ^[24][31].

Coble et al. analyze the challenges and opportunities of Big Data in agriculture and conclude that these technologies will lead to relevant analytics at every stage of the agricultural value chain. The authors believe that there are relevant policy, farm management, supply chain, consumer demand, and sustainability issues. A significant challenge mentioned by the authors is the management of data repositories due to the volume and variety ^[25][32]. According to Coble et al., data service providers struggle to attract a critical mass of farmers to submit farm data to repositories. This concern is partly because the value of an agricultural data community ultimately depends on the number of farms and acres in the system, i.e., the size of the network. Concerning data variability, different levels of data quality are available, e.g., some farmers are known for not correctly labeling on-farm production data or for not considering all sowing data. These aspects are of utmost importance for the system to deliver a proper analysis and the farmer to make a correct decision. The authors point out that progress must be made in creating public data repositories, engaging both large and small farmers in real collaboration.

On the other hand, Misra et al. present an overview of Big Data, AI, and IoT and their disruptive role in shaping the future of agri-food systems ^[26][33]. The authors discuss these technologies in greenhouse monitoring, smart farm machines, drone-based crop imaging, supply chain modernization, social media (for open innovation and sentiment analysis) in the food industry, food quality assessment (using spectral methods and sensor fusion), and food safety. They indicate an economic impact from the point of view of productivity, lower cost of production, and improved quality. Therefore, adopting technological innovations and taking advantage of them is essential for modern agriculture and the food industry.