Remote Sensing of Soil Organic Carbon (SOC).

What is SOC and why is it Important.

Soil organic carbon (SOC) is a key component of the global carbon cycle, and it plays a vital role in the functioning of the terrestrial ecosystem. SOC levels are primarily determined by photosynthesis (fixation of atmospheric CO2 into plant biomass), respiration, and decomposition. The removal of atmospheric CO2, via photosynthesis, is known as soil carbon sequestration. In the context of climate change, soil carbon sequestration is emerging as an important tool for mitigating anthropogenic carbon emissions [1].

Figure 1: Carbon balance within the soil (brown box). SOC is added by photosynthesis, and lost via respiration. A persistent SOC store, called humus, is produced by root decomposition, and the actions of soil organisms [1].


Remote Sensing of SOC

Recent research has shown that SOC can be measured via a combination of satellite-based remote sensing, and machine learning. Castaldi et al. [2] compared Sentinel-2 SOC predictions, with those from airborne hyperspectral data, using the following methodology:

- 3 cloud-free S2 images were downloaded, with similar crop exposure conditions than during the airborne acquisitions.

- atmospherically correct for Bottom of Atmosphere (BOA) reflectance, using ESA SNAP.

- resample the atmospherically corrected images at 10m.

- select nine S2 bands: B2 (490 nm), B3 (560 nm), B4 (665 nm), B5 (705 nm), B6 (740 nm), B7 (783 nm), B8 (842 nm), B11 (1610 nm) and B12 (2190 nm).

- mask everything that was not bare cropland soil at S2 acquisition time (bare soil pixels have NDVI <0.25).

- gather soil sample data: 2015 in Belgium (170 samples) and in 2016 in Luxembourg (194 samples), and in 2017 in Germany (231 samples). These locations were selected to encompass various SOC, and soil, types.

- measure ground truth SOC content for each soil sample (laboratory SOC).

-  pair the ground truth SOC data with the S2, and airborne sensor, data, to thereby form the training dataset.

- Two different multivariate models were tested for all spectral data: partial least square regression (PLSR) and random forest (RF). Prediction accuracy was evaluated by the RMSE.

- Each model was trained on matrices of the spectral bands (independent variables) and SOC content (dependent variable) using 10-folds cross validation.

The authors found that the spatial resolution of Copernicus Sentinel-2 is adequate to describe SOC variability both at field and regional scale, and the prediction accuracy obtained by Copernicus Sentinel-2 data is similar to that retrieved by airborne hyperspectral data. The most important spectral features for SOC prediction were located in the VIS region at 450, 590 and 664 nm, and very close to the S2 bands in this spectral range (490, 560 and 665 nm) [2].

Figure 2. Soil organic carbon (SOC) maps of a field in Demmin area, obtained by HySpex (aerial) (a) and Sentinel-2 (b) data. Regional SOC is shown in the right figure [2].


Geosynergy Proof of Concept for Fitzroy Basin

We demonstrated proof of concept remote sensing of SOC for the Fitzroy Basin region of QLD. In the absence of field soil data samples, we used CSIRO data (Soil and Landscape Grid National Soil Attribute Maps) [4]. These are maps (Geotiffs) of bulk density, SOC, clay, silt, sand, available water capacity, nitrogen, phosphorus, depth of soil, depth of regolith, etc (at various depths).

Note that the CSIRO data is not field data, it is a combination of historical, and model-generated, data. Our plan is to build on this POC, by progressively incorporating real field samples. Nonetheless, the CSIRO data is sufficient to demonstrate POC. 

A correlation matrix found that NTO and PTO were highly correlated to SOC, and should therefore be excluded to avoid data leak. Other soil data features are less highly correlated to SOC, and more readily available from soil maps, so can be used.

Figure 3: correlation matrix of soil data variables.

SOC was then modeled and predicted SOC for a large region of Fitzroy Basin (200 x 600 km), using CSIRO soil data and 6 Sentinel-2 products, as summarized in the following figure:

Figure 4: method. Soil data, and Sentinel-2, rasters were aligned (increased cell size), exported to CSV, and modeled using XGBoost Regressor (model stacking could also be performed). Over 10 K-Folds, out-of-fold predictions were collected, and then back-converted from CSV to Geotiff.

Model accuracy was evaluated using per cell RMSE of the out of fold predictions.
A scatter plot of the predicted vs ground truth data is shown.

Figure 5: scatter plot of the OOF SOC predictions, Lastly, the predicted SOC values were converted from CSV to Geotiff, allowing visual display over the AOI, versus the ground truth CSIRO data.

Figure 6: SOC visualizations over Google Sat for predicted SOC and ground truth (Fitzroy Basin extent in red). Left: ground truth (CSIRO soil data). Right: predicted SOC.


Future Research and Development

Our Fitzroy Basin POC is based on CSIRO soil data, which is a combination of historical data, and data extrapolated from models. This modality will now developed further, by incorporating ground truth field soil samples. An example soil sampling strategy, from the Castaldi et al. study, is shown below, for the Luxembourg AOI.

Figure 7: Soil type map, and Sentinel-2 image, for the Luxembourg AOI [2]. The red box indicates the flight area for aerial imagery. The white points indicate soil sampling locations [2].

Note that sampling dates should be as close as possible to Sentinel-2 sensing dates. The calibration dataset should be representative of the investigated area, and attempt to capture the full range of SOC values, as indicated by the sampling of different soil types [2].

Secondly, the features used for modeling will be iteratively developed. Regen Network, who are active in this space, have discussed using other soil parameters, and geological predictors, in their modeling including: clay composition %, silt composition %, elevation (DEM), and topographic wetness index. They also stress that SOC is only predicted to soil depth of the soil samples (with a target sampling depth of 10-15 cm) [3].

Thirdly, the modeling approach will be further developed. The Castaldi method only uses 2 models (RF and PLSR), and there is considerable scope for testing model ensembles (model stacking), as well as the use of deep networks. The current approach extracts the soil / S2 data into tabular form, for modeling. However, because of the data is inherently 2-dimensional, it would seem to be a natural fit for approaches such as convolutional neural networks (CNNs).

Lastly, it is worth stressing that this modality is built on Sentinel-2 data, which is highly advantageous, due to its daily revisit time, global coverage, and open source availability.

References

[1] Soil Carbon Storage. Todd A. Ontl.

https://www.nature.com/scitable/knowledge/library/soil-carbon-storage-84223790/

[2] Castaldi, F., Hueni, A., Chabrillat, S., Ward, K., Buttafuoco, G., Bomans, B., Vreys, K., Brell, M. and van Wesemael, B., 2019. Evaluating the capability of the Sentinel 2 data for soil organic carbon prediction in croplands.ISPRS Journal of Photogrammetry and Remote Sensing147, pp.267-282.

[3] Regen Network, SOC remote sensing methodology.

https://app.regen.network/methodologies/carbonplus-grasslands

https://www.youtube.com/watch?v=nJ3IFYuYlcY

[4] CSIRO Soil and Landscape Grid National Soil Attribute Maps - Soil Organic Carbon Fractions (3" resolution) - Release 1

https://data.csiro.au/search/keyword?redirected=true&q=TERN%20Soil

Previous
Previous

Enhancing and Localizing Methane Point Emissions using Wind-Rotation.