15 Access to Historic Soil Moisture Data – Earth Observation Datascience

Downloading, Reading, and Working with Copernicus WEkEO Soil Water Index 12.5 km

15.1 Overview

In this notebook we will examine how one can obtain historic soil moisture data. We will do this by looking into the Copernicus WEkEO portal.

15.2 Imports

from pathlib import Path

import cartopy.crs as ccrs
import hvplot.pandas  # noqa
import numpy as np
import pandas as pd
import xarray as xr
from dotenv import dotenv_values
from hda import Client, Configuration
from numpy.lib.recfunctions import unstructured_to_structured
from pyswi.swi_ts import calc_swi_ts

from envrs.download_path import make_url
from envrs.ssm_cmap import SSM_CMAP

ERROR 1: PROJ: proj_create_from_database: Open of /home/runner/work/eo-datascience/eo-datascience/.conda_envs/environmental-remote-sensing/share/proj failed

15.3 Copernicus WEkEO

The name WEkEO (pronounced [wikio]) refer to the four key organisations – EUMETSAT, ECMWF, Copernicus Programme, and Mercator Ocean International, which link to the broader user community (“WE”), with the purpose of knowledge advancement and sharing (“k”), focussing Earth and Environmental Observation (“EO”).

The platform is a web-based service that helps people easily access and use Earth observation data. Its main goal is to make it simple for researchers, companies, and decision-makers to get the data they need and analyze it without needing advanced tools or a lot of technical knowledge.

15.4 Registration

Similarly to H SAF FTP server, we need to register as an user to make full use of WEkEO, although some features are available without an account. We can follow these steps:

Go to the homepage of WEkEO.

Click on top right “Login” button, whereafter we select “Create an account” and fill out the form.

Now we can login and, for example, look at the available data by clicking on “Go” in the “Data Catalogue” field in the middle of the screen.

This shows us a data viewer where we can select from many different data products.

We will use the latter step to find the datasets we need and create query strings that we will use in a Python environment.

15.5 Harmonized Data Access

The Copernicus WEkEO platform offers a straightforward way to access data through its Application Programming Interface (API). APIs enable different programming languages to communicate and exchange information, effectively serving as a bridge for programs to utilize and extend each other’s functionalities. In the case of WEkEO, the API is called the Harmonized Data Access (HDA) API, which provides uniform access to the entire WEkEO catalogue, including subsetting and downloading capabilities. We will use the Python HDA client to interact with this API.

Install with either pip:

pip install hda

or conda:

conda install -c conda-forge hda

Now, we need to authenticate using the Configuration object and provide your account’s username and password. As in the previous notebook, we will use a dotenv environment for this purpose. To utilize dotenv, you need to add your username to the USER_WEKEO variable and your password to the PASS_WEKEO variable in the .env file.

conf = Configuration(
    user=dotenv_values(".env")["USER_WEKEO"],
    password=dotenv_values(".env")["PASS_WEKEO"],
)
hda_client = Client(config=conf)

15.6 Finding Soil Moisture Data

To find the SWI 12.5 km dataset in time series format, we have to return to the data viewer in the browser. Click in the “Layers” field on the + sign, and search for “Soil Water Index” in the “Free Text Search”.

Then select the fourth dataset from the top; “Soil Water Index Time Series 2007-present (discrete global grid), global, daily - version 3”. Click on the download button in the layers field, which unfolds a form which allows subsetting of the data. We can enter here a bounding box (bbox) for Mozambique. Scroll further down and click on “Show API request(s)”.

We will copy this query to our Python environment and use the hda_client to search for the files.

query = {
    "dataset_id": "EO:CLMS:DAT:CLMS_GLOBAL_SWI_12.5KM_V3_TIME-SERIES_NETCDF",
    "bbox": [
        30.315,
        -27.49,
        41.07,
        -10.20,
    ],
    "itemsPerPage": 200,
    "startIndex": 0,
}

matches = hda_client.search(query)

print(matches)

SearchResults[items=3,volume=312MB]

If you are content with the results then you can start the downloading process. Here we again create a directory to contain the downloaded files. The downloaded file with be in NetCDF format. NetCDF is a data formats that is tailored to multi-dimensional array-oriented scientific data with many options to label the data.

%%capture

local_path = "cgls_swi_12_5"

if not Path(local_path).is_dir():
    Path(local_path).mkdir()

matches.download(download_dir=local_path)

The downloaded data is projected on the same Fibonacci grid as the H SAF ASCAT 6.25 km surface soil moisture (see notebook 1). But the NetCDF data format requires another solution to read the data. Hence we will use another package to read the data. We use xarray, which is optimised to work with multi-dimensional arrays. We will introduce this package into more detail later on. We will call the method to_dataframe() to turn the xarray.Dataset into a pandas.DataFrame.

def _preprocess(ds: xr.Dataset) -> xr.DataArray:
    """Preprocess Dataset.

    Parameters
    ----------
    ds: Xarray.Dataset
        Individually loaded NetCDF

    Returns
    -------
    Xarray.DataArray: _Data array containing only SWI from depth 10 cm

    """
    return ds.SWI_010


df_wekeo = xr.open_mfdataset(
    "cgls_swi_12_5/*.nc",
    engine="h5netcdf",
    combine="nested",
    parallel=True,
    chunks=-1,
    preprocess=_preprocess,
    join="outer",
    compat="no_conflicts",
).to_dataframe()

df_wekeo.head()

		lon	lat	SWI_010
locations	time
565647	2007-01-01 12:00:00	30.104193	-10.060086	NaN
	2007-01-02 12:00:00	30.104193	-10.060086	80.5
	2007-01-03 12:00:00	30.104193	-10.060086	80.5
	2007-01-04 12:00:00	30.104193	-10.060086	80.5
	2007-01-05 12:00:00	30.104193	-10.060086	79.0

We use again hvplot to validate our search results.

df_wekeo.hvplot.points(
    x="lon",
    y="lat",
    c="SWI_010",
    groupby="time",
    x_sampling=0.16,
    y_sampling=0.16,
    rasterize=True,
    crs=ccrs.PlateCarree(),
    tiles=True,
    cmap=SSM_CMAP,
    clim=(0, 100),
    frame_width=500,
    clabel="Soil Water Index",
)

15.7 What is the Soil Water Index?

We have not yet talked about what the data exactly represents. The Soil Water Index is derived from surface soil moisture. The current dataset is derived from H SAF ASCAT 12.5 km and represents modelled approximation of soil water content at deeper layers of the soil. In the above plot we selected moisture content at a 10 cm depth interval. Remember that ASCAT can only probe the surface of soils.

The Soil Water Index provides a measure of how wet or dry the soil is over a specific area, ranging between 0 and 1. It is often derived using a combination of satellite data and modeling techniques. One commonly used approach to derive the SWI is through the TU Wien model¹

The equation to derive the SWI typically involves a convolution of surface soil moisture observations with an exponential filter. This process helps in estimating the root-zone soil moisture, which is more relevant for agricultural and hydrological applications. The general form of the equation can be described as follows:

\[ \text{SWI} = \frac{\sum_{i=0}^{n} w_i \cdot SSM_i}{\sum_{i=0}^{n} w_i} \]

Where:

\[ \begin{aligned} \text{SWI} &\quad \text{: Soil Water Index} \\ \text{SSM}_i &\quad \text{: surface soil moisture at time step} i \\ w_i &\quad \text{: the weight assigned to the surface soil moisture at time step} i \\ n &\quad \text{: number of time steps considered in the convolution} \end{aligned} \]

The weights \(w_i\) are often determined using an exponential decay function, which gives more importance to recent observations. The exponential decay function can be expressed as:

\[ w_i = e^{-\frac{t_i}{T}} \]

Where: \[ \begin{aligned} t_i &\quad \text{: the time difference between the current time step and the time step} i \\ T &\quad \text{: a characteristic time scale that determines the rate of decay of the weights} \end{aligned} \]

This approach allows the SWI to reflect the cumulative effect of surface soil moisture over a period, providing a more stable and representative measure of the soil moisture conditions in the root zone.

One will observe that the SWI at depth is often less variable (more stable) than contemporaneous surface soil moisture with only delayed and dampened responses to external forcing (e.g. precipitation). We can visualize this by applying the SWI algorithm to one of the five timeseries introduced in notebook 1.

Let’s select only the year 2024 from the H SAF ASCAT 6.25 km surface soil moisture for Buzi:

sm_ts = (
    pd.read_csv(
        make_url("ascat-6_25_ssm_timeseries.csv"),
        index_col="time",
        parse_dates=True,
    )
    .loc["2024"]
    .dropna()
)

sm_ts = sm_ts[sm_ts.name == "Buzi"].sort_index()
sm_ts.head()

https://git.geo.tuwien.ac.at/api/v4/projects/1266/repository/files/ascat-6_25_ssm_timeseries.csv/raw?ref=main&lfs=true

	name	type	surface_soil_moisture	unit
time
2024-01-01 06:23:23.932999680	Buzi	ascat	48.61	%
2024-01-01 07:17:21.082001408	Buzi	ascat	51.29	%
2024-01-01 18:52:13.149000192	Buzi	ascat	43.34	%
2024-01-01 19:46:11.505000448	Buzi	ascat	41.71	%
2024-01-03 06:36:02.624999936	Buzi	ascat	53.57	%

Now we can use the TU Wien developed package pyswi to calculate time series SWI. First we have to prepare the numpy arrays for the SSM and Julian time stamps², like so:

swi_jd = sm_ts.index.to_julian_date().to_numpy().astype(np.float64)
sm = sm_ts["surface_soil_moisture"].to_numpy()

dtype = np.dtype([("sm_jd", np.float64), ("sm", np.float32)])
ssm_ts = unstructured_to_structured(
    np.hstack((swi_jd[:, np.newaxis], sm[:, np.newaxis])), dtype=dtype
)

Now we can calculate the SWI. Here we calculate it for depth intervals of 5, 50, and 100 cm.

t_value = np.array([5, 50, 100], dtype=np.int32)
swi_ts, gain_out = calc_swi_ts(ssm_ts, swi_jd, t_value=t_value)

After which, we convert the numpy arrays back to a pandas DataFrame and merge it with the original SSM data. This requires the Julian time steps to be converted back to a normal pandas datetime type.

swi_ts = pd.DataFrame(swi_ts)
swi_ts["time"] = pd.to_datetime(swi_ts["swi_jd"] - 2440587.5, unit="D")
ts = pd.merge_asof(sm_ts, swi_ts.set_index("time"), left_index=True, right_index=True)

Finally, we plot the results where see the lagged and dampened response of soil moisture content with increasing depth.

ts.hvplot.line(
    y=["surface_soil_moisture", "swi_5", "swi_50", "swi_100"], frame_width=800
)

W. Wagner, Lemoine, G., Rott, H., 1999, A Method for Estimating Soil Moisture from ERS Scatterometer and Soil Data, Remote Sensing of Environment, Volume 70-2, pp. 191-207, doi: S0034-4257(99)00036-X ↩︎
Julian time stamps, also known as Julian dates, are continuous counts of days and fractions of days since a specific starting point, commonly used in astronomy and scientific applications to measure time intervals precisely.↩︎