GIS-Based Cropland Suitability Prediction Using Machine Learning: Comparison
Please note this is a comparison between Version 1 by Dorijan Radočaj and Version 2 by Vivi Li.

The increasing global demand for food has forced farmers to produce higher crop yields in order to keep up with population growth, while maintaining sustainable production for the environment. As knowledge about natural cropland suitability is mandatory to achieve this, the aim of this paper is to provide a review of methods for suitability prediction according to abiotic environmental criteria. The conventional method for calculating cropland suitability in previous studies was a geographic information system (GIS)-based multicriteria analysis, dominantly in combination with the analytic hierarchy process (AHP). Although this is a flexible and widely accepted method, it has significant fundamental drawbacks, such as a lack of accuracy assessment, high subjectivity, computational inefficiency, and an unsystematic approach to selecting environmental criteria. 

  • farmland
  • geographic information system
  • vegetation index
  • biophysical variables
  • Sentinel-2
  • analytic hierarchy process

1. Introduction

Accelerated global population growth necessitates the production of more and more food [1]. Conventional intensive agricultural production can meet short-term food demands, but it frequently comes at the price of long-term sustainability and land degradation [2]. An additional challenge is posed by climate change and pollution, which undermine the effectiveness of conventional cropland management [3]. The two most popular methods used in conventional agricultural production systems to improve crop yields are (1) transforming land cover to create new cropland and (2) modifying agrotechnical techniques, such as increasing the use of fertilizers and pesticides, so as to boost yields on existing cropland. Land use conversion has a greater potential to increase overall yields than improving agricultural practices on existing cropland [4]. However, new cropland converted from forest and wetland areas results in the destruction of natural habitats and poses a threat to biodiversity. Habitat destruction is the most common cause of flora endangerment, with a negative outlook for their recovery potential [5]. The use of fertilizers and pesticides is necessary for the continuous production of high and stable yields in conventional intensive agricultural production [6]. Fertilizer and pesticide applications continue to increase when the crop rotation system is not maintained, agrotechnical measures are inadequate, and certain agricultural crops are grown in an inherently unsuitable location [4][7][4,7]. This practice leads to environmental pollution through heavy metals, mainly copper, persistent organic pollutants, and excessive nitrogen and phosphorus polluting waterways [8][9][8,9]. Such pollution has a direct impact on human health, flora and fauna, spreading through surface and groundwater, and accumulating in living organisms [10].
Alternative methods for increasing crop yields have been developed in response to the requirement for the long-term sustainability of rising agricultural production. On existing croplands, it is feasible to increase yields while using fewer fertilizers and pesticides with a traditional method by cultivating crops in naturally suitable locations [11]. Crop rotation and agricultural management strategies must be modified to foster circumstances for ecologically responsible and sustainable agriculture as important abiotic factors are either impossible to change or extremely difficult to do so [12]. Because of the temporal variability of abiotic criteria, primarily caused by climate change, regular monitoring and analysis of changes in suitability levels are required [13]. Agro-technics can directly benefit from the inventory of existing suitability levels as it can be used to propose changes to agrotechnical measures and the installation of irrigation systems in sensitive regions [14]. Crop cultivation in naturally suited sites is one part of regionalized agricultural production that aims to achieve sustainable output (Figure 1).
Figure 1. The concept of regionalized agricultural production according to cropland suitability. The conceptualization of agricultural land management without determined suitability (left), predicted GIS-based cropland suitability (center), and regionalized agricultural production with land management plans optimized according to predicted cropland suitability (right).
The analytic hierarchy process (AHP) in conjunction with GIS-based multicriteria analysis is currently regarded as the standard for assessing the suitability of crops. However, this method is sensitive to subjectivity, and cannot effectively incorporate a huge quantity of data [15]. The use of machine learning techniques has improved the conventional methodology by addressing these flaws in the prediction of biological, chemical, and physical soil characteristics. Hengl et al. [16] included soil samples and publicly accessible spatial data reflecting abiotic criteria as part of the SoilGrids project into a novel machine-based prediction. This marks the start of a paradigm change in the forecasting of spatial factors in the environment, opening the possibility for applications in fields other than soil science. Among these fields, cropland suitability prediction is one of the most convenient for the application of a similar machine-learning-based approach, as it depends on the same environmental criteria groups [15]. While the machine learning approach in cropland suitability prediction has fundamental similarities with SoilGrids, the most notable difference is present in training data selection (Figure 2). Cropland suitability has been defined in previous studies as a slightly abstract term, being quantified by in-situ crop yield data [11] or data derived from remote sensing satellite missions, such as vegetation indices [17] or biophysical variables [15].
Figure 2.
 Conceptual comparison of the machine learning prediction approach implemented in SoilGrids [16] and for cropland suitability calculation.

2. Advancements of the Conventional GIS-Based Multicriteria Analysis for Cropland Suitability Prediction

Previous studies predominantly considered GIS-based multicriteria analysis as the current standard for quantifying cropland suitability [11][18][19][11,18,19]. Because of its adaptability and universal application, it has become an essential approach in suitability research across a variety of scientific fields [20][21][22][20,21,22]. However, this approach’s basic flaws make it less successful when dependability and universal application become more important. The standard procedure of GIS-based multicriteria analysis includes several steps (Figure 3):
Figure 3.
 The conventional procedure of GIS-based multicriteria analysis for cropland suitability prediction.
  • Defining the study aim,
  • Selecting relevant environmental criteria,
  • Standardizing criteria values,
  • Weighting (pondering) of criteria,
  • Calculation of suitability and interpretation of the results.
Previous research used the GIS-based multicriteria analysis to assess the suitability of agricultural crops in a specific geographic location. An important component of the studies analyzed is the timing of the analysis, which is conditioned by the availability of data from one or more consecutive seasons [23]. Knowing the timing of the analysis is also necessary because of the temporal variability of the basic abiotic criteria, especially the climate criteria [24]. The novel methods using machine learning and open-data satellite imagery are based on the same assumptions. This feature allows for multitemporal comparison of suitability results from the two approaches, enabling the evaluation of historical accuracy and updating of the cropland management plans. The ability to adjust cropland management to the classification of abiotic parameters (climate, soil, and terrain) has increased with the emergence of remote sensing satellite missions with open data access [25]. The availability of these data is expected to increase in the future, due to the lifespan of present missions and their continuous upgrade [26]. The fluctuation of temperature and soil properties, caused by climate change and inadequate agricultural management, requires constant updating of cropland suitability results for individual crops [27]. Yields targeted by farmers, based on relative comparisons with neighboring agricultural parcels, are an additional factor that may lead to increased use of fertilizers and pesticides in inherently unsuitable locations [28]. While these goals can be attained using the conventional GIS-based multicriteria analysis, more computationally efficient approaches for cropland suitability prediction using machine learning could further improve their applicability. Previous research used the analysis of prior studies and expert views to choose the abiotic criteria for cropland suitability modelling using GIS-based multicriteria analysis [11][29][11,29]. This procedure was based on the assumption that each microsite is agroecologically specific for crop cultivation [30]. The selection of the type and the amount of abiotic criteria for a particular site and crop type was based on the knowledge of one or more agronomic experts. The three factors that have the greatest effects on cropland suitability—climate, soil, and topography—were consistently divided from the abiotic criteria [15][31][15,31]. Although most of these studies focused on these criteria groups, their amount varied significantly by crop type and geographic location. The exceptions to the use of climate, soil, and topography criteria were constraints representing appropriate land cover classes and categories of irrigation systems. The selection of climate criteria depends directly on the extent of the study area, while for smaller areas (such as municipalities), climate homogeneity is assumed [19]. The variability in the number of abiotic suitability criteria used and their distribution among the criteria groups indicate a high influence of human subjectivity in their selection. Although this method enables accurate suitability modelling based on professional expertise, it may be incorrect and biased, as it requires diverse and numerous environmental criteria in order to include all major aspects of suitability [15]. The variation from the ideal number of seven criteria in the AHP, according to Saaty and Ozdemir [32][49], which should range from five to nine criteria, expresses the computational inefficiency of this technique. According to these suggestions, using less than four criteria only permits a bare minimum depiction of cropland suitability. When subjectively assessing the relative relevance of the criteria, ten or more criteria reflect a wider variety of abiotic criteria, but they also raise the possibility of inaccuracy and the complexity of computations. Spatial modelling of selected abiotic criteria in a GIS environment is usually performed in a raster data model [33][50]. The input data are usually distributed in numerous combinations of data types obtained from various institutional and scientific sources. To harmonize point vector input data into raster form, it is necessary to perform the prediction of soil values at unsampled locations by spatial interpolation [34][51]. The selection of the optimal method and parameters of spatial interpolation is a necessity for reliable modelling of the input criteria, which decreases significantly if they are not adjusted to the characteristics of the input values [35][52]. The relative complexity of such modelling, as well as subjectivity in the selection of the spatial interpolation method, parameters, and classification standards, suggest a potential reduction in human error by automating the process [36][53]. The same approach increases time efficiency by not requiring individual tool editing in GIS and facilitates data distribution using a globally accepted standard. Following the same principle, the developed processing framework can be easily adapted to different abiotic criteria in cropland suitability studies. Commonly used criteria in cropland suitability studies have been partially represented by global standards with subjective modifications [27] or the application of different standards with the same objective [6][37][6,54]. As one of the most common approaches to criteria selection is the analysis of previous studies, such cases lead to potential inaccuracy in the selection of value ranges in further standardization and weighting procedures. Diverse input value ranges of modeled abiotic criteria are converted into a consistent numerical normalization interval throughout the standardization procedure [17]. Typically, numerical intervals such as 0–1 or, more commonly, 1–5 are used, which allow for a simple representation of suitability using the five classes defined by the Food and Agriculture Organization of the United Nations (FAO). In addition to combining values expressed in different units of measurement, quantitative and qualitative data are also integrated, which is often required to determine cropland suitability [38][55]. Three basic standardization methods have been used in previous studies: linear stretching, stepwise standardization, and fuzzy standardization. In linear stretching, the minimum and maximum input values correspond to the limit values of the defined standardization interval. Although the linear stretching method is very simple and completely objective, it leads to unreliable standardization when the input data contain extreme values, which is often the case in suitability studies. In contrast, the stepwise standardization method is a completely subjective method based on discrete ranges of input values for a single standardized value. Thus, generalized and approximate numerical values that typically have an identical range are used to quantify the suitability level [39][56]. Because of the simplicity and flexibility of the method, it has found the most frequent application in previous cropland suitability studies [17]. Standardization using the fuzzy method combines the advantages of the previous two methods with continuous standardization and relative objectivity using mathematical models and the implementation of standardization thresholds based on a subjective approach [40][57]. The alternatives in the choice of fuzzy logic mathematical models (linear, S-shaped, J-shaped, and G-shaped) allow for additional flexibility in standardization. Nevertheless, fuzzy logic methods are used much less frequently in suitability studies compared with stepwise standardization. There is currently no extensive research that outlines the precise impact of standardization methods on the accuracy of suitability results; therefore, users choose a standardization method based only on their subjective preferences. The comparative evaluation of these standardization methods has shown that the variety of available methods in complex GIS-based multicriteria analysis should be evaluated more thoroughly in future studies.

3. Recent Developments in Machine-Learning-Based Cropland Suitability Prediction

According to the disadvantages of the conventional GIS-based multicriteria analysis with AHP, machine learning methods have already enabled researchers to provide more computationally efficient, objective, and reliable cropland suitability prediction (Figure 4). Machine learning has been efficiently used to address both the subjectivity and the difficulty of including environmental data in the GIS context. It facilitated the integration of big data’s many forms, as well as its processing, and it created intricate nonlinear linkages between training data and independent predictors (covariates) [16]. Machine learning allows for a fully automated and subjective determination of feature importance, as opposed to the manual and subjective computation of weights of specific abiotic criteria in the suitability result [41][69].
Figure 4.
 The comparative generalized workflows of conventional and novel machine learning approach for cropland suitability prediction.
Two general approaches were mainly improved using machine learning in cropland suitability prediction studies:
  • Computationally efficient suitability assessment methods using global satellite missions with a high (e.g., Sentinel-2, Landsat 8) and medium spatial resolution (e.g., Sentinel-3, PROBA-V). This approach ensures the applicability of the accuracy assessment for predicted cropland suitability, otherwise commonly omitted from the conventional approach. The excessive subjectivity of the GIS-based multicriteria analysis with AHP has been independently evaluated using this globally available remote sensing open data. These methods provide a scientific contribution to the training/test data component of the suitability prediction.
  • Suitability prediction methods based on machine learning algorithms and globally available spatial data that provide high prediction reliability with lower user subjectivity compared with the GIS-based multicriteria analysis. Aside from enabling the inclusion of significantly more environmental covariates in the suitability prediction without impairing computational efficiency, exact and specific abiotic criteria become accessible. In contrast with the generalized and vague criteria (e.g., “precipitation”, “temperature”, or “soil texture”), these methods included specific relevant environmental abiotic criteria, such as the mean air temperature in individual months or soil clay, silt, and sand contents in narrow soil depth layers.
Video Production Service