Crop diseases constitute a serious issue in agriculture, affecting both quality and quantity of agriculture production. Disease control has been a research object in many scientific and technologic domains. Technological advances in sensors, data storage, computing resources and artificial intelligence have shown enormous potential to control diseases effectively. A growing body of literature recognizes the importance of using data from different types of sensors and machine learning approaches to build models for detection, prediction, analysis, assessment, etc. However, the increasing number and diversity of research studies requires a literature review for further developments and contributions in this area.
According to the FAO , pest attacks and plant diseases are considered as two of the main causes of decreasing food availability and food hygiene. Determined by the disease and the development stage, the damages on the crops range from simple physiological defects to plant death. In addition to biological agents, other physical agents such as abrupt climate change  can cause diseases and harm the plant. Conventional methods for detecting and locating plant diseases include direct visual diagnosis by visual identification of disease symptoms appearing on plant leaves or by chemical techniques that involve molecular tests on plant leaves .
Promising approaches for detecting and locating diseases were proposed in recent years using automatic monitoring and recognition systems. Advances in sensor technologies and data processing have opened new perspectives for the detection and diagnosis of crop anomalies. Disease surveillance can be performed by capturing data from the soil and plant cover or using sensors, such as remote sensing (RS) or ground equipment, as well as with developing and testing machine learning algorithms . Implementing management practices with smart algorithms optimizes profitability, sustainability and protection of land resources.
Furthermore, the agriculture field can be supplied with multiple sensors measuring environmental characteristics, plant canopy, leaves indices extracted from remote sensing imagery and IoT sensors. Given a variety of data extracted, data fusion techniques are required to assemble those types of data to better understand crop growing conditions and disease symptoms development. In addition, machine learning-based data fusion has undergone important development, and when used on agriculture data would have a great impact on plant protection field, in particular, disease and early disease detection. Therefore, several multi-sensors and remote sensing based fusion techniques have been used in agriculture for this purpose .
Crop ground imaging is the technique of acquiring crops’ fruit and leave images at ground level using smartphones or digital cameras. Since visual symptoms on the crops and plant leaves are important for disease detection, researchers tried to capture plant leaves in field conditions , raising the challenge of dealing with complex background, shadows and unstable luminosity. As a result, the spectral characteristics of plants are affected by diseases, leading researchers to invest in the detection of infected and uninfected leaves and the classification of different disease severity degrees with visual symptoms and even before visual symptoms appearance . A modern approach for disease detection relies on machine learning algorithms to explore data from different acquisition systems:
Traditional machine learning algorithms were used for the purpose of disease detection. (SVM) models are commonly used for plant disease detection due to their prediction efficiency. Similarly, manual extraction of lesion characteristics and combination of multiple SVM classifiers (color, texture and shape characteristics) for diseases recognition on plant leaves have been proposed in order to reduce misclassification . Statistical analysis of some indices using the principal component analysis (PCA) model successfully differentiated between healthy plants and infected golden potato disease progression .
Deep learning models were then used to improve the prediction quality and address larger types of diseases and crops. This subsection presents deep learning models deployed on RGB images, multispectral images and hyperspectral images for early disease detection.
In , the authors tested several deep learning models from scratch and with transfer learning. The approach outperformed the conventional method using only visual information with an accuracy of 98%. Likewise, in , a smaller dataset of infected tomato plant leaves images was divided into different pest attacks and plant diseases. The detection method achieved an accuracy of 95.65% using the DensNet161 with transfer learning.
For better practicability for farmers, researchers developed mobile applications for disease detection using deep learning adaptable models with mobile computing capacity and energy. The model was able to distinguish tomato leaves diseases through image recognition with accuracy reaching 89.2%. In , MobileNet was tested for citrus disease detection and compared to another CNN model, such as Self-Structured (SSCNN) classifiers. The results showed that SSCNN was more accurate for citrus leaf disease classification on mobile phone images.
In a pre-symptom disease detection task exploiting hyperspectral images, the authors in  used the extreme learning machine (ELM) classifier model on full wavelengths of hyperspectral tomato leaves images. They selected the effective wavelengths that contain much disease information in order to avoid instability of convergence in predictive models (high correlation between bands). In the same context,  developed a method to detect fusarium head blight disease in wheat using hyperspectral images and a specific acquisition protocol considering the field conditions. The authors were able to classify infected and healthy wheat head crops using hyperspectral images.
Ground imagery is an interesting technology in smart farming. Deep models using this type of acquisition guarantee high detection accuracy thanks to the close level leaf imaging with high resolution,Table 2 summarizes the effective wavelengths used for disease detection for close range imaging presented in this section. However, this strategy fails to monitor and diagnose plant diseases at a large scale. Moreover, these techniques are time-consuming in a wide range study area.
|697.44, 639.04, 938.22, 719.15, 749.90, 874.91, 459.58 and 971.78 nm||-|||
|full range 750–1350 nm
|665 nm and 770 nm||SR|||
|670, 695, 735 and 945 nm||NDVI|
|655, 746, and 759–761 nm||-|||
|445 nm, 500 nm, 680 nm, 705 nm, 750 nm||RENDVI, PSRI|||
|442, 508, 573, 696 and 715 nm||-|||
|Full range 400–1000 nm||-|||
UAVs are exploited also as a precision agriculture (PA) solution for monitoring and controlling crops growth , eventual disease development and weed detection , thanks to their ability to collect higher resolution images at lower costs. UAVs equipped with embedding cameras and sensors perform efficient field data acquisition for field scale visualization and analysis. Additional elements can help enhance performances of crop monitoring techniques, such as the choice of appropriate sensors and intelligent recognition models. As spectrometry is sensitive to diseases, multispectral cameras are more often used for disease detection studies.
For systems using these types of sensors, a large amount of data is first stored in large-scale databases provided by information systems such as the geographical information system (GIS). In fact, the information system enables the visualization and analysis this data. Data collected provide information on soil and vegetation cover characteristics, such as soil organic content and soil moisture, biomass quantity, weed existence and early detection of crop stress with eventual disease stage evaluation.
Traditional machine learning algorithms are used for plant disease detection using UAV images. One of the first models attempting to predict infection severity on plants from images is the Backpropagation NN (BPNN) , in which the authors extracted spectral data from remote sensing hyperspectral images of tomato plants. In , the authors adopted a segmentation approach based on the Simple Linear Iterative Clustering (SLIC) for soybean foliar diseases detection. In , UAV images were utilized for the detection of citrus canker in several disease development stages.
To conclude, the performance of traditional machine learning approaches is limited and can easily vary according to different growing periods and with different acquisition equipment. In addition, the low performance can also be due to the feature engineering process, which provokes important information loss.
Deep learning models have also been developed and used to tackle the limitations of traditional machine learning for plant disease detection using UAV images. Similarly, with the aim of detecting disease symptoms in grape leaves , the authors used the CNN approach by performing a relevant combination of image features and color spaces. The classification model was designed based on a combination of deep convolutional generative adversarial networks (DCGANs), and an AdaBoost classifier. The first step consisted of overlaying the two types of images, using an optimized image registration, and resulting images were used with semantic segmentation approach (SegNet architecture) to delineate and detect the vine symptoms.
Conversely, satellites covering wider land areas offer historical images of the study area depending on satellite acquisition frequency. Indeed, satellites can provide multispectral images with very high spatial resolution that can range from 0.5 m to more than 30 m. Conversely, high temporal resolution satellites have very low spectral resolution . For instance, the MODIS sensor for the Terra/Aqua satellite collects daily images.
Several machine learning methods have been used to perform land monitoring from satellite images, for instance: mapping of urban fabric , crop classification and field boundaries  and pest detection .
Traditional machine learning was used to test the usage of satellite images for disease detection. The types of stress detected in this study were pests and disease stress, heavy metal stress or double stress combining the two first types. SVM was deployed in  for disease detection in winter wheat. In , the naive Bayes algorithm was tested on spectral signatures of coffee berry necrosis issued from Landsat 8 OLI satellite images in the aim of disease detection; the classification reached an accuracy of 50%.
Deep learning has proven its high performance for disease detection also using satellite images. In , the authors proposed a gated recurrent unit (GRU)-based model to predict development of sudden death syndrome (SDS) disease in soybean quadrats. In fact, the range of 10 m resolution and above are barely enough for crop classification task, which becomes challenging for disease detection . To bridge the gap of lacking data and improve the prediction, several analysts recommended incorporating satellite images with aerial images and other data sources such as wireless sensor networks that capture environmental parameters for disease detection .
A typical wireless monitoring system must contain multiple sensors connected in each zone to an installed node, with sensors and nodes communicating via radio-frequency. In case the WSN is unavailable, one of the existing alternative solutions is the weather station  which provides different local measurements in real-time for various agricultural applications. Several studies have been established to collect wireless sensor network data for disease detection. Nevertheless, the classic methods used for disease detection are limited and it is more interesting to take advantage of machine learning algorithms to generate efficient prediction models.
Sensors for temperature, relative humidity and leaf humidity are placed in the vineyard to collect the necessary data. The prediction was based on multiparameters extracted from the field, namely atmospheric temperature, atmospheric humidity, CO2 concentration, illumination intensity, soil moisture, soil temperature and leaf wetness. The model achieved promising results, proving the validity of environmental data for early disease detection. Since abiotic factors such as temperature, soil moisture and humidity help to determine whether the plant is growing in healthy conditions or not, the system used two sensors: a soil moisture sensor and a temperature-humidity sensor.
Deep learning: In , the authors developed an approach for prediction of cotton disease and pests occurrence. (Bi-LSTM) was then introduced for prediction; it achieved an accuracy of 87.84% and an overall area under the curve (AUC) score of 0.95. Nevertheless, we noticed that the amount of IoT papers established for disease detection using machine learning is not sufficient, which may be due to the fact that these data are not efficient in prediction crop health status. Thus, these inputs coupled with other types of data can provide valuable results by using appropriate fusion techniques and adequate AI models for good adjustments to these complex multivariate data.
Some of the most innovative technologies in plant protection are connected sensor networks, since there is a correlation between variations in microclimatic conditions and plant stress. Numerous research studies were carried out to control and monitor crops, and also predict plant health based on meteorological characteristics . In addition, images can be a better representation of crop health state. Ground images, UAV images  and satellite images  have proven effective in detecting plant diseases.
We noticed that a new tendency in disease detection application is spreading widely, characterized by the use of deep learning. DL eliminates the manual feature extraction phase that can sometimes result in low prediction performance and requires less effort for feature engineering . In addition, DL models have been used to efficiently classify diseases in challenging environments with complex backgrounds and overlapping plant leaves. Conversely, traditional machine learning cannot effectively distinguish symptoms of disease with similar characteristics, nor can it take advantage of a larger number of trainings .
Data sources can provide useful information about the studied phenomena; for unimodal data source a simple data concatenation can be enough for prediction purposes . Otherwise, when we have several types of sensors, advanced data fusion is necessary . Data from several sensors first require data analysis to characterize, order or correlate the different available data sources, and then to decide on the strategy or algorithm to be used to merge the data. Among the relationships that exist, we can find distribution, complementarity, heterogeneity, redundancy, contradiction, concordance, discordance, synchronization and difference in granularity .
In literature, data fusion methods are divided into three main categories: probability-based methods, evidence-based methods and knowledge-based methods. Probability-based methods  such as the Kalman filter , the Bayesian fusion  and the Hidden Markov model  are limited to low-dimensional or homogeneous data and suffer from high computational complexity. Therefore, they are not adequate for complex problems. Evidence-based methods , such as the Dempster Shafer theory , are used to deal with missing information, additional assumptions and solve the problem of uncertainty.
Multimodal fusion based on machine learning  is capable of learning representations of different modalities at various levels of abstraction , with significantly improved performances . Multimodal fusion can be split into two main categories : model-based approaches that explicitly address fusion in their construction, and model-agnostic approaches which are general and flexible and do not directly depend on a specific machine learning method. Depending on data abstraction level, different fusion architectures for the agnostic fusion  are possible.
Measurement fusion (or early fusion), also known as first level data fusion, allows the immediate integration and presentation of sensor data using feature vectors. Data are generally concatenated , which makes fusion limited when dealing with heterogeneous data. This architecture is the most widely used because of its simplicity: it is easy to align data. In , the authors tried to predict the rate of photosynthesis and calculate the optimal CO2concentration based on real-time environmental information via a WSN system in greenhouses for tomato seedling stage cultivation.
Feature fusion combines the results of early fusion and individual unimodal predictors by merging feature vectors, allowing heterogeneous data from different data sources to be combined. Deep Convolutional Neuron Networks (DCNNs) were used on multiple levels of multimodal data fusion. The feature level fusion with feature learning from raw data was performed after the raw data extraction phase. DCNNs were applied on each type of data to learn features, and then the outputs were extracted as the learned features.
Decision fusion (or late fusion) involves processing data from each sensor separately to obtain high-level inference decisions, which are then combined in a second stage . The decision-level fusion method combines information from different sensors after each sensor has made a preliminary decision. In , a use case of weighted decision fusion architecture on multiple sensors is presented. Then, the method of weighted majority voting (WMV) was used to merge the resulting vectors, with each sensor data being weighted by a confidence measure (or weight).
Hybrid fusion merges information at two or more levels. In the hybrid approach proposed in , the authors developed the merging technique of different CNN classifiers for object detection in changing environments. Three types of input modalities were used: RGB, depth and optical flow. The CifarNet architecture was designed as the single expert model, and then the outputs of each expert network model were fused with weights determined by an additional network called gating network.
Tensor Fusion (TFM) consists mainly of a tensor fusion layer that models unimodal, bimodal and trimodal interactions using a three-fold Cartesian product from modality integration . LMF has been proposed to identify the emotions of speakers according to their verbal and non-verbal behaviors, based on visual, audio and language data. Three YouTube videos databases were used with annotation of feelings, speaker traits and emotions. The learning network for acoustic and visual modalities was represented by a two-layer neural network, and for linguistic modalities, a Long Short-Term Memory (LSTM) network was used to extract the representations.
Multimodal Search Architecture Fusion (MFAS) is a generic architecture that creates a large number of possible fusion architectures, scans the neural architecture and choses the best performing architectures . The MFAS is inspired by the progressive neural architecture search (PAS)  where the search is efficiently guided for architecture sampling using temperature-based sampling . Testing three datasets, the MFAS has proven its efficacity against the state-of-the-art results on those datasets.
In , the authors performed a comparison between four types of fusion (late, MoE, LFM and MiD) on image and signal modalities for automatic texture detection of objects. Fusion methods provided latent vectors which were introduced in the corresponding artificial neural networks ANNs. Tested on degradation scenarios, the Late, MoE and Mid fusion methods behaved similarly. The fusion architecture potentially allowed ANN to achieve good results in the texture detection task.
To conclude, machine learning-based multimodal fusion approaches have an important potential to solve open issues in agriculture by merging different types of data. We believe that exploiting these advanced techniques for disease detection issues can provide a better understanding of the plant environment and thus improve prediction performance.
Even if advanced fusion techniques are a rapidly growing area in agriculture, literature still lacks studies on disease detection in this domain. Different applications on data fusion in agriculture are presented in literature, specifically data fusion for yield prediction , crop identification , land monitoring  and disease detection .
In , the authors investigated the relationship between canopy thermal information and grain yield, using data fusion of data from different sensors. They extracted spectral (VIs), structure (vegetation fraction (VF), canopy height (CH)), thermal (normalized relative canopy temperature (NRCT)) and texture information from canopy using multi-sensors installed on a UAV. Two fusion models were used in this study, input-level feature fusion DNN (DNN-F1) and intermediate-level feature fusion DNN (DNN-F2). in terms of prediction accuracy, spatial adaptability and robustness across different types of models.
In this study , the authors exploited spatio-temporal data to segment satellite images of vegetation fields. The data used are images captured by the Gaofen 1 and 2 satellite. The authors developed a CNN 3D active architecture to extract information for the multi-temporal images. In the same context, some researchers tested the feasibility of temporal CNNs (TempCNNs) for satellite images classification .
Therefore, some researchers have attempted to develop resolution improvement techniques to solve this issue using data fusion. In this context, the authors in  developed an extended super-resolution convolutional neural network (ESRCNN) for data fusion framework, specifically to blend Landsat-8 and Sentinel-2 images of 20 m and 10 m spatial resolution, respectively. The fusion approach outperformed the no-fusion approach using the same models, and the best accuracies were achieved using the SegNet method reaching 84.4% and 89.8% without and with image fusion, respectively, on the test set. They combined spectral canopy information (vegetation indices) extracted from Worldview-2/3 data with canopy structure features (canopy cover and height) calculated from UAV RGB images.
One of the first attempts to integrate multisource data for disease detection was developed using both meteorological data and satellite scenes . In , the authors proposed a multi-context fusion network for crop disease detection. Nonetheless, their method suffers from imbalanced data due to seasonal and regional difficulty with various categories of crop diseases. For banana plant detection in the field, they trained the object detection model RetinaNet on UAV RGB images and developed a custom classifier for simultaneous banana tree localization and disease classification.
The applications of data fusion in agriculture presented in this section can be divided into three types. Spatio-spectral fusion is a multi-band fusion that constitutes a fine-spatial and fine-spectral fusion. Spatio-temporal fusion is based on blending data with fine spatial resolution and coarse temporal resolution (temporal revisit frequency) with data that have fine temporal resolution, but coarse spatial resolution, with the objective being to create a fine spatio-temporal resolution. Finally, multimodal fusion corresponds to heterogeneous multisensory fusion.
In addition to noise, observed data may be characterized by non-commensurability, different resolutions and incompatible sizes or alignment, and consideration should be given to exploit a pre-processing model to solve this problem . Furthermore, different data sources may provide contradictory data or missing values. Once data are ready for the learning process, unbalanced data, which are basically unequal representation, can also affect the prediction rate. Thus, the biggest constraint of data fusion is the multimodality with data from distinct types of sensors, different fusion architectures can be adopted .
In addition to RGB and hyperspectral images, thermal images have proven to be very useful in the detection of plant diseases. The main motivation is the fact that plant leaf temperature can help predict plant health status. Several researchers have explored this type of images for disease detection approaches at the leaf level, and others have combined these images with multispectral data for effective early detection at the ground vehicle and aerial vehicle level since plant leaves acquisition requires involving people to drill down the whole field to acquire images, which is an energy and time-consuming strategy.
Spectral imaging using UAV provides important information on soil and the upper part of plants in a large spectrum, therefore, UAVs are used more often. Merging the two technologies can also broaden the spectrum of plants to be processed while ensuring early detection accuracy. Satellites can be an excellent alternative to UAVs to monitor healthy plant growth depending on the spatial and spectral resolution. A promising new field in the detection of plant disease on a larger scale is UAVs and satellites imaging which has proved its usefulness in many agriculture applications.
Despite the usefulness of satellite images, this area of research faces several challenges. Clouds and their shadows represent a major obstacle when it comes to processing and extracting disease signature from high-resolution satellite images; when clouds cover vegetation, the acquired images become unexploitable. The main obstacles to the crop monitoring application and disease detection using satellites are rapid changes in agricultural land cover in relatively short time intervals, differences in seeding dates, atmospheric conditions and fertilization strategies; since it is difficult to predict whether the reflectance changes are due to disease or to those factors, an in-situ study is required to validate predictions. High-resolution satellite images can be a key approach for very large-scale disease detection.
The current technologies of imaging sensors have many limitations for earlier disease detection. The association of multiple sensors data can provide a better understanding of the growth and health status of the crop and thus better prediction rates. This explains a growing interest in the scientific community for the multimodal data fusion in the field of crop disease detection. One can benefit from the power of AI algorithms to process the multimodal data sources and predict crop diseases in earlier stages.
Indeed, neural networks and deep neural network models demonstrated a significant capacity in the agriculture field to monitor the healthy crop growth and capture anomalies outperforming traditional machine learning algorithms. SegNet and FCN outperformed SVM model in both experimental fields and with different combinations of image bands as shown inTable 7; where RGBMSand NIRMSare respectively visual and near-infrared (NIR) bands of multispectral image, FRGBMS and FNIRMS with high resolution are respectively fusion results of RGBMS and NIRMS. In addition, from the table we can clearly see the impact of image fusion on the recognition results and the accuracies improved for all the models, including the traditional machine learning model.
Thus, the correct diagnosis depends on the choice of DL architecture and the type, quantity and the quality of the data. The application of multimodal deep learning involves the selection of a learning architecture and algorithm. Lately, multimodal fusion has proven an inescapable potential and is increasingly used in several domains such as healthcare, sentiment analysis, human–robot interaction, human activity recognition or object detection. In the agriculture field, several deep learning fusion approaches have been proposed, such as applications in yield prediction, land monitoring, crop identification and disease detection.
The most widely used type of fusion in agriculture is the fusion of multi-sensors data from aerial vehicles, fusion of multi-resolution satellites data and the fusion of satellite and UAV images. Thus, for our specific task, the use of other data sources can enhance early disease detection performance. However, few multimodal fusion studies have been conducted, particularly for disease detection. Promising results of multimodal fusion were presented in this paper, demonstrating the high potential of deep learning fusion models for prediction when using multimodal data, which creates an opportunity for further research works.