2. Deep Learning
Developing a reliable crop yield prediction model with traditional approaches such as the static regression approach and the mechanistic approach is difficult due to their limited applicability and uncertainty
[9,10][5][6]. Many studies have used machine learning approaches such as regression tree, random forest, multivariate regression, association rule mining, and artificial neural networks for crop yield prediction
[9,11][5][7]. Machine learning models treat the output, crop yield, as an implicit function of the input variables, such as weather components and soil conditions, which could be very complex
[11][7]. Moreover, supervised learning approaches in machine learning fail to capture the nonlinear relationship between input and output variables
[12][8]. However, the advancements in technology in recent years have made it possible to develop an advanced crop yield prediction model utilizing deep learning. Deep learning is a class of machine learning that uses hierarchical structures to link with the other layers, and the capability to analyze both unlabeled and unstructured data makes it a class apart from other traditional machine learning approaches
[13][9]. Deep learning is broadly used in the agricultural field as it can analyze huge datasets, learn the relationships between various variables, and use nonlinear functions. These approaches can extract features for huge datasets in an unsupervised environment. When compared to traditional machine learning approaches, deep learning approaches perform better in feature extraction
[13][9]. Since an accurate crop yield prediction relies on the factors influencing crop growth, deep learning has a strong ability to extract features from available data.
Deep neural networks have a collection of nonlinear layers that convert the untested input data into an extracted form at each layer
[14][10]. Deep neural networks with various hidden layers are important to discover the nonlinear correlation between input and response variables
[14][10]. Nevertheless, they are difficult to train and need recently developed hardware and optimization methodologies
[15][11]. Thus, a rise in the number of hidden layers can be effective but it has some restrictions, which can be resolved by implementing some techniques. The vanishing gradient problem in deeper neural networks can be reduced by making use of residual skip connections for the network
[15,16,17][11][12][13]. Moreover, the performance of deep learning approaches has been improved by undertaking various techniques such as stochastic gradient descent (SGD), batch normalization, and dropout.
3. Remote Sensing for Data Acquisition
Crop yield can vary according to environmental factors, climatic conditions, disease, and other parameters. The crop growth during different stages is influenced by these above-mentioned factors and this reflects on crop production
[10][6]. Monitoring of environmental factors, other parameters, and crop growth can be carried out using various instruments and methodologies, such as ground observation, remote sensing, global positioning systems, and on-field surveying. In ground observation and other traditional methods, it is challenging to personally acquire data for a large area and the result will be less accurate and unreliable
[10][6]. To counteract this limitation, nowadays, remote sensing is increasingly utilized for crop monitoring.
Remote sensing techniques provide details about the status of crops at various growth levels through the use of spectral signatures, and all at a minimum cost when compared to extensive on-field surveying. Remote sensing technology is the acquisition and analysis of information about the world and its objects by an instrument placed in the atmosphere or a satellite, without any physical contact
[25][14]. Remote sensing has the ability to produce an adequate number of data when compared with other data acquiring techniques such as field surveying
[26][15]. It is the process of monitoring and recognizing places on Earth by measuring the emitted and reflected radiation with the help of sensors
[27][16]. Data acquired using remote sensing have several applications in agriculture, which include crop type classification, crop yield prediction, soil property detection, crop health monitoring, weather data assessment, and soil moisture retrieval
[28][17].
One of the most important reasons to use optical remote sensing for acquiring crop information is due to the computation of vegetation indices. Combinations of spectral measurements at various wavelengths are known as spectral indices. They are employed to derive vegetation phenology and calculate biophysical parameters
[29][18]. Among various spectral indices, vegetation indices are the desired indices significantly used in crop yield prediction. Crops in healthy condition are indicated by strong absorption and reflectance of red and near-infrared bands
[30][19]. The strong difference in the intensities of the absorption and reflectance of red and near-infrared bands can be integrated into various quantitative indices of the vegetation environment. These linear or nonlinear combinational operations are known as vegetation indices (VI)
[31,32][20][21]. Some of the VI are the normalized difference vegetation index (NDVI), green vegetation index (GVI), chlorophyll absorption ratio index (CARI), and many others.
4. Impact of Vegetation Indices and Environmental Factors
Vegetation indices are formulated in which the sensitivity to the vegetation characteristics is maximized while the factors such as soil background reflectance and directional or atmospheric effects are minimized. Most of the vegetation indices use information involving the red and near-infrared (NIR) canopy reflectances or radiances
[33][22]. A satellite with a multispectral sensor and several bands covers the visible, near-infrared, and short-wave infrared wavelength regions, which provides numerous vegetation indices.
Different types of vegetation indices are designed by various researchers and are extensively utilized in several research areas. Even though there is some variation in these proposed indices, all these designed indices are sensitive to biochemical attributes and biophysical parameters such as leaf angle distribution function, leaf chemical contents, fraction of absorbed photosynthetically active radiation, biomass, fraction of green coverage, and leaf area index (LAI). Due to the strong correlation between vegetation indices and biophysical parameters, they are widely used to determine the nutritional level of plants, mostly relative to nitrogen
[34,35][23][24], to classify vegetation and to schedule crop management. However, the phenological stage of evaluation and the type of indices utilized influence their accuracy. Zhao et al.
[35][24] developed a function that established a relationship between the crop coefficient (Kc) for irrigation management and the vegetation index, and it was used in water conservation. Other significant areas where these indices are used are the estimation of crop yield, protein content, biomass, weed management, and fertilizer management
[36,37,38][25][26][27]. Some of the most commonly used indices are the normalized difference vegetation index (NDVI), green vegetation index (GVI), enhanced vegetation index (EVI), chlorophyll absorption ratio index (CARI), and many others.
Various studies have investigated how the correlation between remotely sensed data and crop yield differs as a function of time in the course of the growing season
[39,40][28][29]. The studies have indicated that the relationship between vegetation indices and crop yield varies during the crop growth cycle
[41,42,43,44][30][31][32][33]. Moreover, the relationship between vegetation indices and crop yield is not consistent in every growth stage. For example, the suitable phenological growth stages for wheat to obtain spatial yield data from satellite remote sensing are stem elongation, heading, and the development of fruit until early ripening
[45,46,47][34][35][36]. Similarly, Ali et al.
[48][37] proposed that the appropriate crop growth stage to study the correlation between vegetation index and crop yield and to estimate crop yield and biomass for oat grain was the appearance stage of the leaf health and kernel watery ripe stage. Most previous studies have performed correlation analyses between vegetation indices, soil data, and yield data and were mostly carried out for particular crop types, specific years, and a restricted number of vegetation indices
[48,49,50,51][37][38][39][40]. The ultimate goal in correlation analysis between vegetation indices and crop yield data is to develop an optimal crop yield prediction model
[52,53,54][41][42][43].
For developing a workable crop yield prediction model, it is important to determine the appropriate vegetation indices and environmental factors
[55,56][44][45]. You et al.
[57][46] predicted corn yield using the greenness index with 90% accuracy. Fernandes et al.
[58][47] noticed that the crop yield is influenced by vegetation indices selection. Their study on maize yield prediction showed that NDVIre, NDVI, and GNDVI performed well for field variability. Haghverdi et al.
[59][48] observed that crop yield prediction with NDVIre was more effective when compared to NDVI and GNDVI. Wang et al.
[60][49] estimated the corn yield by combining vegetation indices such as the normalized vegetation index (NVDI) and Absorbed Photosynthetically Active Radiation (APAR) with environmental factors including canopy surface temperature and water stress index
[56][45]. Further, other features, such as humidity, nutrients, and soil information, are also used in crop yield prediction. As so many features are already used in crop yield prediction, there is less investigation related to finding specific features majorly impacting crop yield prediction. Hence, detailed research is essential to achieve a better overview of these variables and factors influencing crop yield prediction.