Estimate Soil Organic Carbon from Remote Sensing: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Subjects: Remote Sensing
Contributor: , , , , , ,

Monitoring soil organic carbon (SOC) typically assumes conducting a labor-intensive soil sampling campaign, followed by laboratory testing, which is both expensive and impractical for generating useful, spatially continuous data products. 

  • deep neural networks
  • land use
  • image segmentation
  • U-Net
  • environment

1. Introduction

Soil represents a complex mixture of organic and inorganic constituents with different physical and chemical properties, which vary significantly between locations and even within a single field [1]. It is a key component of terrestrial ecosystems, as it facilitates the circulation of energy and materials between the atmosphere and the biosphere [2].
Soil health can be defined as the ability of the soil to function effectively as a component in a thriving ecosystem [3]. In order to ensure effective monitoring and enable adequate assessment of the condition of the soil, one needs to select appropriate indicators of its condition. The indicators should meet certain criteria: they should be accepted by experts as valid; their measurement should be carried out routinely and on a large scale; they need to be understood and accepted by the general population in order to achieve a global impact [4].
Soil organic carbon (SOC) content is a widely accepted indicator of soil quality, as SOC plays a central role in various soil functions [5]. SOC measurement is a common component of soil property analysis. Furthermore, carbon as an element is well known and recognized by the global population [6]. All of this makes SOC a valuable indicator for assessing and monitoring changes in soil health.
The amount and quality of SOC are closely related to key soil functions, including nutrient mineralization, aggregate stability, air and water permeability, water retention, and flood control ability [5]. These soil functions are, in turn, related to a wide range of ecosystem attributes. For example, high SOC levels in mineral soils tend to correlate with high plant productivity, which has a positive effect on wildlife habitat, distribution, and population size [7]. Through the protection and increase of stored SOC, one can protect or increase soil fertility, reduce soil erosion, and reduce habitat conversions [8].
In addition to its importance for the soil, SOC has the potential to help neutralize the negative effects of increasing concentrations of CO2 in the atmosphere (which significantly contribute to global warming and climate change [9]) and help ensure food security around the wold [10].
While SOC plays a key role in mitigating climate change by acting as a carbon sink, the historical loss of carbon from this pool [11] has been significant, and the potential for future accelerated loss under warming scenarios is a serious threat [12,13].
As a natural solution to fight climate change, strategies that involve conserving existing SOC stocks (avoidance of losses) and replenishing stocks in carbon-depleted soils [14] can be used as a means of achieving the United Nations Sustainable Development Goals (UNSDG), the goals of the United Nations Framework Convention on Climate Change (UNFCCC), and the United Nations Convention on Combating Desertification (UNCCD) [15].
Despite the scientific consensus about the potential and myriad benefits that can be brought about by the development and application of soil organic carbon storage and sequestration techniques, they remain limited in practice. A fundamental issue affecting the adoption of such methodologies is the lack of accurate and cost-effective ways of measuring SOC content in the top layer of the soil (as this is most affected by land use, agricultural practices, etc.).
When it comes to measuring global SOC stocks, many estimates have been published over the past decades, and most studies report a global SOC estimate of approximately 1500 Pg of carbon (Pg C), but there is considerable variation among estimates (ranging from 504 to 3000 Pg C) [16].
The large variation in the estimates of global SOC stocks arises from differences in the sampling period, the intensity and spatial resolution of soil profile databases, as well as from differences in approaches to calculating the estimates themselves [17]. The uneven distribution of georeferenced soil profiles around the world is another reason for such a large variation in the estimates [18]. In addition, there is no consensus when it comes to including inorganic carbon, different levels of rock content [19], and the effects of natural or anthropogenic phenomena (such as flooding, erosion, fire, soil fertilization, and plowing [20]) in carbon stock assessments.

2. Monitoring Soil Organic Carbon Based on Remote Sensing 

recent years, remote sensing has emerged as a particularly effective method for tracking agricultural and environmental changes [23,24,25,26]. The technology relies on diverse sensors and platforms, such as satellite constellations and Unmanned Aerial Systems (UAS) to gather data, which are then typically processed using advanced algorithms, often in the realm of machine learning (ML) and deep learning (DL) [27].
Deep learning represents a specialized subset of machine learning that excels at learning from large, unstructured datasets using complex, layered neural networks. While traditional ML algorithms work well with smaller, structured datasets and often require manual feature selection, DL algorithms automatically extract features and patterns, especially from data like images and speech. This makes deep learning more powerful for certain applications, but it requires more computational resources and is often less interpretable than conventional machine learning techniques.
The ongoing advancements in remote sensing represent a promising alternative to traditional SOC monitoring. Toth and Jóžków provide a fairly recent review of different remote sensing platforms and sensors available today [28].
In the study presented here, the focus is on inferring SOC content from satellite data only. Most studies focused on determining SOC, however, rely on data (spectrograms) collected from hand-held sensors. While the accuracy achieved in this way is typically higher than using satellite imagery, such approaches can hardly be scaled to enable continuous monitoring of carbon stocks on a global level.
Gomez et al. [29] presented an early, albeit limited study (based on just 146 soil samples), which compared the results that can be achieved applying ML methods to in-the-field Vis–NIR measurements vs. applying them to hyperspectral satellite imagery. The images were obtained from the Hyperion sensor on the EO-1 satellite, which is, unfortunately, no longer functional, and there is no longer an active hyperspectral satellite that captures images in the VNIR–SWIR region, making it hard to replicate their work. In addition to trying to model the whole dataset used in the study, the authors tried focusing on specific land cover classes (cropping soils, pasture soils) and opted for a partial least-squares regression as their SOC predictor. Gomez et al. observed that the SOC in their cropping soils ranged between 0.54% and 1% and was lower than in the pastures, where SOC was in the 1.08% to 5.1% range. They evaluated their methodology based on R2 and the Root-Mean-Squared Error (RMSE). The models based on satellite imagery did not perform well for cropping soils (R2 of 0.04 and RMSE of 0.11) and lagged significantly behind the hand-held-sensor-based models in terms of R2 (R2 of 0.16 and RMSE of 0.1). However, when evaluated on pastures and the whole dataset, the two approaches achieved comparable and much better performance. The approach based solely on satellite data at their native resolution achieved an R2 of 0.51, but the RMSE was quite high (0.73% SOC). Thus, the study showed that land cover is very important, when it comes to modeling and estimating SOC remotely.
More recently, Wang et al. [30] tried to use ML techniques to estimate SOC stock in the semi-arid rangelands of eastern Australia through the application of different machine learning techniques, with a focus on evaluating the impact of considering seasonal fractional cover on model performance. These features were used to extend other hand-crafted features derived from satellite imagery, as well as other remotely sensed climate features such as rainfall and temperature and data about lithology. They trained and evaluated their models using a limited amount of soil samples (705). They used random forests (RF) [31], Boosted Regression Trees (BRT) [32], and support vector machines (SVM) [33] to model their data. The RF approach performed the best and achieved an R2 of 0.47 on their dataset.
Several studies tried to evaluate the effectiveness of hyperspectral data obtained from airborne sensors and extended their findings to evaluate the expected performance of sensors expected to be deployed in the future [34,35]. While we focus on multispectral data in the study presented here, it is worth noting that, albeit relying on a very limited set of soil samples (81) obtained for a 7 km2 area in Luxembourg, 40% of which were used as a test set, Steinberg et al. achieved a relatively high R2 (0.74) and an RMSE of 0.22% for SOC using autoPSLR applied to hyperspectral data from an airborne sensor [35]. Once sufficient hyperspectral data are available, the methodology we propose can easily be adapted to that domain, leading to even better performance.
Over the last decade, deep learning has revolutionized the area of machine learning and artificial intelligence and has become the dominant paradigm in the domain. The crucial advance over previously used methods is that the approach relies on end-to-end learning, which allows the ML models to learn the features on which to make their decisions and estimated directly from the raw input data, instead of relying on human-engineered features [36].
Yuan et al. provided an overview of the applications of both classical neural networks and DL models to the monitoring of environmental parameters using remote sensing data [37]. They showed that DL outperformed traditional ML models and has led to significant improvements in many applications, including land cover mapping, vegetation parameter, soil moisture, evapotranspiration, agricultural yield prediction, etc. The authors correctly highlighted the limitation of the DL approaches, which is related to the relatively limited amounts of training data available, as well as the potential to apply transfer learning to circumvent this problem. They mentioned two types of transfer learning: region-based and data-based. The first relates to pretraining on a geographical region for which ample data are available and adjusting the model to a different region with limited data available. In the ML community, this is usually referred to as fine-tuning. The latter is more in line with what the meaning of transfer learning is in the ML domain and relates to transferring the models trained on data obtained from a sensor or a group of sensors to other sensors. In the study presented here, we use a third kind of transfer learning, common in the computer vision community [38], where the initial model is trained on the same type of input data (Sentinel-2), but for a different visual task (land cover classification), and is used as a feature extractor for the final model (which performs SOC estimation in our case).
While the first application that Yuan et al. discussed was land cover, no approaches to estimating SOC were mentioned in this study. In addition, while approaches based on different DNN architectures were discussed (most relying on convolutional neural networks), none were identified in the study that use the U-Net model.
Rakhlin et al., however, successfully applied U-Net with Lovász softmax loss for land cover classification using RGB data made available as part of the DeepGlobe Challenge [39].
Yang et al. used a CNN to try to infer SOC for a central location based on input data that covered the surrounding region [40]. The input of their model was environmental variables combined with MODIS MCD12Q2 phenology variables. They trained and evaluated their approach on a limited set of 733 samples, collected in Anhui Province of China. This limited the complexity of the CNN they could use, since no transfer learning was used in the study, but the CNN fared better than a random forest model, achieving a modest R2 of 0.26.
Emadi et al. [41] focused on Northern Iran and used a large number of input features (105). Most were human-crafted indices extracted from Landsat-8 and MODIS satellite imagery, but their input also included topology-related parameters, such as curvature, slope, etc. Using a dataset of 1879 composite soil samples and relying on 10-fold cross-validation, they compared the performance of several traditional ML algorithms (support vector machines, multi-layer-perceptron, regression decision trees, random forests, and extreme gradient boosting) with a DL model when predicting SOC. The DL model that showed the best results in the study was a fairly simple fully connected neural net, with seven hidden layers and 50 neurons in each of them, but it still outperformed the other methods tested. The authors reported a comparatively large R2 value of 0.65, with an RMSE of 0.75% SOC.
In a recent study, Castaldi et al. [42] evaluated the capability of Sentinel-2 time series to estimate soil organic carbon and clay content at local scale in croplands. The pipeline they proposed relies heavily on human engineering, both in terms of the features they derived from Sentinel-2 imagery (NDVI, NBR2, BSI, S2WI), as well as in terms of how they were used to create the input to their machine learning models. In terms of modeling, they did not opt for deep neural networks, but the Quantile Regression Forest (QRF) algorithm, QRF with added longitude and latitude as covariates, and a hybrid approach, the Linear Mixed-Effect Model (LMEM), which included the spatial autocorrelation of the soil properties. While the latter takes spatial information into account up to a point, their approach is essentially pixel based, which differs from the one proposed here. In addition, the authors of the study aimed to assess the capability of their approach in a very limited scenario, by creating and evaluating models for each of their test sites separately. No attempt was made to create a single model that could be applied globally, or at least for a large part of the Earth’s surface. Thus, the results they achieved could be viewed as a sort of “blue-sky-performance”, which could be reached by a global model using Sentinel-2 images as the input. The R2 of the best of Castaldi et al.’s models ranged from 0.26 to an impressive 0.96 for different locations, with an average R2 of 0.67. The RMSE (in % SOC) ranged from 0.09 to 0.22 and was 0.152 on average.

This entry is adapted from the peer-reviewed paper 10.3390/rs16040655

This entry is offline, you can click here to edit this entry!
Video Production Service