Agricultural Big Data and Machine Learning: Comparison
Please note this is a comparison between Version 2 by Dean Liu and Version 1 by Ania Cravero.

Sustainable agriculture is currently being challenged under climate change scenarios since extreme environmental processes disrupt and diminish global food production. For example, drought-induced increases in plant diseases and rainfall caused a decrease in food production. Machine Learning and Agricultural Big Data are high-performance computing technologies that allow analyzing a large amount of data to understand agricultural production. 

  • agriculture
  • big data
  • machine learning
  • data type

1. Agricultural Big Data

Big Data is a type of technology used when the solution is not trivial due to the complexity of the data. It is usually defined through the four dimensions (or 4 V’s). The first V represents the volume of data generated from a data source, stored, and processed for further analysis. The second V refers to the variety of data due to multiple structures, structures, and sizes. Data can be extracted as raw or unstructured, semi-structured, or structured data. The third V refers to the speed of data transmission needed for data to be processed and analyzed. The fourth V refers to veracity, which refers to the capability to validate the grade of the data [25][1].
Big Data allows scientists and engineers to find patterns and trends by examining large amounts of data from multiple origins. A few years ago, Big Data science became an essential modern discipline for data analysis [26][2]. Big Data includes a mix of classical domains of artificial intelligence, including ML, such as statistics, mathematics, and computer science. In general, it has database systems, ML, and distributed systems [27][3].
The Big Data process begins with specifying the sources to extract the data required [28][4]. The next step is storing the data in one of the created representatives, which depends on its processing level, whether unstructured data, semi-structured, or structured. The data are then transformed through filtering and sorting to improve the data quality for various analyses [29][5]. The next step is to analyze the classified data using analytical tools and algorithms (e.g., Deep Learning (DL), ML, OLAP) [30][6], as well as general data science [29,31][5][7]. This allows decision makers to analyze the data to visualize trends [32][8].
For example, Semlali et al. [33][9] use Big Data tools to monitor atmospheric composition. The system architecture contains the data source layer, ingestion, storage via Hadoop, the data management layer, infrastructure, and the monitoring and security layer. They used data on pollutant gas emissions from other sources, such as agriculture, enterprise, and transport. The authors were able to continuously monitor the atmospheric composition through remote sensing. Figure 1 depicts the complete process.
Figure 1. Architecture of Big Data for atmospheric composition monitoring.
MySQL is a database that stores data that have been extracted, processed, and transformed. Then, Python, Java, and BASH are used scripts to read the raw data in Hadoop. Figure 2 shows an example of the process.
Figure 2.
Representation of the ingestion process.
Another example is Alex et al. [34][10], who develop a Big Data system that predicts whether fertilizers will cause disease in crops. They use data such as soil moisture, average rainfall, and soil nutrients. The authors also used data such as phosphorus (P), nitrogen (N), magnesium (Mg), calcium (Ca), and sulfur (S). The Big Data process starts with data enrichment, followed by data clustering, so the data can be classified and analyzed to deliver recommendations. Finally, the Hadoop ecosystem was used to store and process the data analyzed with ML. Figure 3 depicts the complete process.
Figure 3.
Big Data architecture for fertilizer management and yield prediction.
Big Data enables data scientists and farmers to understand farming behavior, such as weather, land, soil, crops, animal production, weeds, food safety, biodiversity, remote sensing, farmer decision making, insurance, financing, and climate change [12][11]. It also enables the development of supply chain platforms, allowing agents to access high-quality products, processes, and tools that are capable of improving performance, predicting demand, and addressing farmers according to crop needs, such as the appropriate use of fertilizers.

2. Machine Learning

ML is considered an area of research focusing on mathematical theory, system characteristics, and the product of learning algorithms. The investigation process is interdisciplinary, encompassing disciplines such as artificial intelligence, knowledge theory, optimization, statistics, cognitive science, control, mathematics, and engineering [35][12]. ML is attractive because it can be used in various science domains, significantly impacting society. For example, ML can be used to solve problems such as pattern recognition, recommendation controllers, fact prediction, data mining, and automatic control systems [36,37][13][14].
ML can be classified into three algorithms depending on the available data: supervised, unsupervised, and reinforcement learning. Table 1 summarizes these techniques, differentiating them from various points of view, particularly in data processing. The “Learning Algorithms” row explains the methods used, and the “Data Processing Tasks” row contains the problems to be solved. In ML, the data must be structured, so it must be processed in most cases. This task consists of cleaning the data to remove noise and inconsistencies, integrating it if it is drawn from multiple sources, and transforming the data to normalize it [38][15].
Table 1. Machine Learning techniques.
Classification Supervised Learning Unsupervised Learning Reinforcement Learning
Data processing

tasks
Classification

Estimation

Regression
Prediction

Clustering
Decision-making
Learning algorithms Support vector machine

Hidden Markov model

Naive Bayes

Neural networks

Bayesian networks
Gaussian mixture model

K-means

X-means
Q-learning

Sarsa learning

TD-learning

R-learning
Supervised and unsupervised learning is primarily focused on data analysis. On the other hand, reinforced learning is used for decision-making situations. The ML algorithms presented in Table 1 can optimize the implementation of a task by analyzing samples of data or backgrounds. An important aspect is that ML will function better with more extensive volumes of data to be explored [38][15].
ML algorithms have been used to improve livestock welfare; increase livestock production; improve yield prediction, crop management, disease detection, weed detection, crop quality improvement, and species distinction; and improve water and soil management [4,12,38,39][11][15][16][17].
There are numerous challenges when executing ML in Agricultural Big Data since some techniques must be adapted when there is a large volume of data or the data are variable. There are also challenges in validating the data and obtaining a quality set. Solving these challenges is not a trivial task, but proposals have been carried out, allowing progress in this area of research [12][11].

References

  1. Sassi, I.; Ouaftouh, S.; Anter, S. Adaptation of Classical Machine Learning Algorithms to Big Data Context: Problems and Challenges. In Proceedings of the 2019 1st International Conference on Smart Systems and Data Science (ICSSD), Rabat, Morocco, 3–4 October 2019; pp. 1–7.
  2. Elshawi, R.; Sakr, S.; Talia, D.; Trunfio, P. Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service. Big Data Res. 2018, 14, 1–11.
  3. Haig, B.D. Big data science: A philosophy of science perspective. In Big Data in Psychological Research; Woo, S.E., Tay, L., Proctor, R.W., Eds.; American Psychological Association: Washington, DC, USA, 2020; pp. 15–33.
  4. Santos, M.; e Sá, J.; Costa, C.; Galváo, J.; Andrade, C.; Martinho, B.; Lima, F.; Costa, E.; Lima, F. A big data analytics architecture for industry 4.0. In Proceedings of the World Conference on Information Systems and Technologies, Madeira, Portugal, 11–13 April 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 175–184.
  5. Salma, C.A.; Tekinerdogan, B.; Athanasiadis, I.N. Chapter 4—Domain-Driven Design of Big Data Systems Based on a Reference Architecture; Morgan Kaufmann: Burlington, MA, USA, 2017; pp. 49–68.
  6. Sowmya, R.; Suneetha, K. Data mining with big data. In Proceedings of the 2017 11th International Conference on Intelligent Systems and Control (ISCO), Coimbatore, India, 5–6 January 2017; pp. 246–250.
  7. Song, I.Y.; Zhu, Y. Big data and data science: What should we teach? Expert Syst. 2016, 33, 364–373.
  8. Demchenko, Y.; De-Laat, C.; Membrey, P. Defining architecture components of the big data ecosystem. In Proceedings of the 2014 International Conference on Collaboration Technologies and Systems, CTS 2014, Minneapolis, MN, USA, 19–23 May 2014; pp. 104–112.
  9. Semlali, B.E.B.; Amrani, C.E.; Ortiz, G. Hadoop paradigm for satellite environmental big data processing. Int. J. Agric. Environ. Inf. Syst. 2020, 11, 23–47.
  10. Alex, S.A.; Kanavalli, A. Intelligent computational techniques for crops yield prediction and fertilizer management over big data environment. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 3521–3526.
  11. Cravero, A.; Pardo, S.; Sepúlveda, S.; Muñoz, L. Challenges to Use Machine Learning in Agricultural Big Data: A Systematic Literature Review. Agronomy 2022, 12, 748.
  12. Cherkassky, V.; Mulier, F. Learning from Data: Concepts, Theory, and Methods; John Wiley & Sons: Hoboken, NJ, USA, 2007.
  13. Rudin, C.; Wagstaff, K. Machine learning for science and society. Mach Learn. 2014, 95, 1–9.
  14. Qiu, J.; Wu, Q.; Ding, G.; Xu, Y.; Feng, S. A survey of machine learning for big data processing. Eurasip J. Adv. Signal Process. 2016, 1, 1–16.
  15. Benos, L.; Tagarakis, A.C.; Dolias, G.; Berruto, R.; Kateris, D.; Bochtis, D. Machine Learning in Agriculture: A Comprehensive Updated Review. Sensors 2021, 21, 3758.
  16. Liakos, K.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine Learning in Agriculture: A Review. Sens. Multidiscip. Digit. Publ. Inst. 2018, 18, 2674.
  17. Bal, S.K. Agro-meteorological basis of extremes of temperature with special perspective to livestock and poultry. Clim. Resilient Anim. Husb. 2021, 23.
More
Video Production Service