from pathlib import Path
import cartopy.crs as ccrs
import hvplot.pandas # noqa
import numpy as np
import pandas as pd
import xarray as xr
from dotenv import dotenv_values
from hda import Client, Configuration
from numpy.lib.recfunctions import unstructured_to_structured
from pyswi.swi_ts import calc_swi_ts
from envrs.download_path import make_url
from envrs.ssm_cmap import SSM_CMAP
Downloading, Reading, and Working with Copernicus WEkEO Soil Water Index 12.5 km
15.1 Overview
In this notebook we will examine how one can obtain historic soil moisture data. We will do this by looking into the Copernicus WEkEO portal.
15.2 Imports
15.3 Copernicus WEkEO
The name WEkEO (pronounced [wikio]) refer to the four key organisations – EUMETSAT, ECMWF, Copernicus Programme, and Mercator Ocean International, which link to the broader user community (“WE”), with the purpose of knowledge advancement and sharing (“k”), focussing Earth and Environmental Observation (“EO”).
The platform is a web-based service that helps people easily access and use Earth observation data. Its main goal is to make it simple for researchers, companies, and decision-makers to get the data they need and analyze it without needing advanced tools or a lot of technical knowledge.
15.4 Registration
Similarly to H SAF FTP server, we need to register as an user to make full use of WEkEO, although some features are available without an account. We can follow these steps:
- Go to the homepage of WEkEO.
- Click on top right “Login” button, whereafter we select “Create an account” and fill out the form.
- Now we can login and, for example, look at the available data by clicking on “Go” in the “Data Catalogue” field in the middle of the screen.
- This shows us a data viewer where we can select from many different data products.
We will use the latter step to find the datasets we need and create query strings that we will use in a Python environment.
15.5 Harmonized Data Access
The Copernicus WEkEO platform offers a straightforward way to access data through its Application Programming Interface (API). APIs enable different programming languages to communicate and exchange information, effectively serving as a bridge for programs to utilize and extend each other’s functionalities. In the case of WEkEO, the API is called the Harmonized Data Access (HDA) API, which provides uniform access to the entire WEkEO catalogue, including subsetting and downloading capabilities. We will use the Python HDA client to interact with this API.
Install with either pip:
pip install hda
or conda:
conda install -c conda-forge hda
Now, we need to authenticate using the Configuration
object and provide your account’s username and password. As in the previous notebook, we will use a dotenv environment for this purpose. To utilize dotenv, you need to add your username to the USER_WEKEO
variable and your password to the PASS_WEKEO
variable in the .env
file.
= Configuration(
conf =dotenv_values(".env")["USER_WEKEO"],
user=dotenv_values(".env")["PASS_WEKEO"],
password
)= Client(config=conf) hda_client
15.6 Finding Soil Moisture Data
To find the SWI 12.5 km dataset in time series format, we have to return to the data viewer in the browser. Click in the “Layers” field on the +
sign, and search for “Soil Water Index” in the “Free Text Search”.
Then select the fourth dataset from the top; “Soil Water Index Time Series 2007-present (discrete global grid), global, daily - version 3”. Click on the download button in the layers field, which unfolds a form which allows subsetting of the data. We can enter here a bounding box (bbox) for Mozambique. Scroll further down and click on “Show API request(s)”.
We will copy this query to our Python environment and use the hda_client
to search for the files.
= {
query "dataset_id": "EO:CLMS:DAT:CLMS_GLOBAL_SWI_12.5KM_V3_TIME-SERIES_NETCDF",
"bbox": [
30.315,
-27.49,
41.07,
-10.20,
],"itemsPerPage": 200,
"startIndex": 0,
}
= hda_client.search(query)
matches
print(matches)
SearchResults[items=3,volume=312MB]
If you are content with the results then you can start the downloading process. Here we again create a directory to contain the downloaded files. The downloaded file with be in NetCDF format. NetCDF is a data formats that is tailored to multi-dimensional array-oriented scientific data with many options to label the data.
%%capture
= "cgls_swi_12_5"
local_path
if not Path(local_path).is_dir():
Path(local_path).mkdir()
=local_path) matches.download(download_dir
The downloaded data is projected on the same Fibonacci grid as the H SAF ASCAT 6.25 km surface soil moisture (see notebook 1). But the NetCDF data format requires another solution to read the data. Hence we will use another package to read the data. We use xarray
, which is optimised to work with multi-dimensional arrays. We will introduce this package into more detail later on. We will call the method to_dataframe()
to turn the xarray.Dataset
into a pandas.DataFrame
.
def _preprocess(ds: xr.Dataset) -> xr.DataArray:
"""Preprocess Dataset.
Parameters
----------
ds: Xarray.Dataset
Individually loaded NetCDF
Returns
-------
Xarray.DataArray: _Data array containing only SWI from depth 10 cm
"""
return ds.SWI_010
= xr.open_mfdataset(
df_wekeo "cgls_swi_12_5/*.nc",
="h5netcdf",
engine="nested",
combine=True,
parallel=-1,
chunks=_preprocess,
preprocess="outer",
join="no_conflicts",
compat
).to_dataframe()
df_wekeo.head()
lon | lat | SWI_010 | ||
---|---|---|---|---|
locations | time | |||
565647 | 2007-01-01 12:00:00 | 30.104193 | -10.060086 | NaN |
2007-01-02 12:00:00 | 30.104193 | -10.060086 | 80.5 | |
2007-01-03 12:00:00 | 30.104193 | -10.060086 | 80.5 | |
2007-01-04 12:00:00 | 30.104193 | -10.060086 | 80.5 | |
2007-01-05 12:00:00 | 30.104193 | -10.060086 | 79.0 |
We use again hvplot
to validate our search results.
df_wekeo.hvplot.points(="lon",
x="lat",
y="SWI_010",
c="time",
groupby=0.16,
x_sampling=0.16,
y_sampling=True,
rasterize=ccrs.PlateCarree(),
crs=True,
tiles=SSM_CMAP,
cmap=(0, 100),
clim=500,
frame_width="Soil Water Index",
clabel )
15.7 What is the Soil Water Index?
We have not yet talked about what the data exactly represents. The Soil Water Index is derived from surface soil moisture. The current dataset is derived from H SAF ASCAT 12.5 km and represents modelled approximation of soil water content at deeper layers of the soil. In the above plot we selected moisture content at a 10 cm depth interval. Remember that ASCAT can only probe the surface of soils.
The Soil Water Index provides a measure of how wet or dry the soil is over a specific area, ranging between 0 and 1. It is often derived using a combination of satellite data and modeling techniques. One commonly used approach to derive the SWI is through the TU Wien model1
The equation to derive the SWI typically involves a convolution of surface soil moisture observations with an exponential filter. This process helps in estimating the root-zone soil moisture, which is more relevant for agricultural and hydrological applications. The general form of the equation can be described as follows:
\[ \text{SWI} = \frac{\sum_{i=0}^{n} w_i \cdot SSM_i}{\sum_{i=0}^{n} w_i} \]
Where:
\[ \begin{aligned} \text{SWI} &\quad \text{: Soil Water Index} \\ \text{SSM}_i &\quad \text{: surface soil moisture at time step} i \\ w_i &\quad \text{: the weight assigned to the surface soil moisture at time step} i \\ n &\quad \text{: number of time steps considered in the convolution} \end{aligned} \]
The weights \(w_i\) are often determined using an exponential decay function, which gives more importance to recent observations. The exponential decay function can be expressed as:
\[ w_i = e^{-\frac{t_i}{T}} \]
Where: \[ \begin{aligned} t_i &\quad \text{: the time difference between the current time step and the time step} i \\ T &\quad \text{: a characteristic time scale that determines the rate of decay of the weights} \end{aligned} \]
This approach allows the SWI to reflect the cumulative effect of surface soil moisture over a period, providing a more stable and representative measure of the soil moisture conditions in the root zone.
One will observe that the SWI at depth is often less variable (more stable) than contemporaneous surface soil moisture with only delayed and dampened responses to external forcing (e.g. precipitation). We can visualize this by applying the SWI algorithm to one of the five timeseries introduced in notebook 1.
Let’s select only the year 2024 from the H SAF ASCAT 6.25 km surface soil moisture for Buzi:
= (
sm_ts
pd.read_csv("ascat-6_25_ssm_timeseries.csv"),
make_url(="time",
index_col=True,
parse_dates
)"2024"]
.loc[
.dropna()
)
= sm_ts[sm_ts.name == "Buzi"].sort_index()
sm_ts sm_ts.head()
https://git.geo.tuwien.ac.at/api/v4/projects/1266/repository/files/ascat-6_25_ssm_timeseries.csv/raw?ref=main&lfs=true
name | type | surface_soil_moisture | unit | |
---|---|---|---|---|
time | ||||
2024-01-01 06:23:23.932999680 | Buzi | ascat | 48.61 | % |
2024-01-01 07:17:21.082001408 | Buzi | ascat | 51.29 | % |
2024-01-01 18:52:13.149000192 | Buzi | ascat | 43.34 | % |
2024-01-01 19:46:11.505000448 | Buzi | ascat | 41.71 | % |
2024-01-03 06:36:02.624999936 | Buzi | ascat | 53.57 | % |
Now we can use the TU Wien developed package pyswi
to calculate time series SWI. First we have to prepare the numpy
arrays for the SSM and Julian time stamps2, like so:
= sm_ts.index.to_julian_date().to_numpy().astype(np.float64)
swi_jd = sm_ts["surface_soil_moisture"].to_numpy()
sm
= np.dtype([("sm_jd", np.float64), ("sm", np.float32)])
dtype = unstructured_to_structured(
ssm_ts =dtype
np.hstack((swi_jd[:, np.newaxis], sm[:, np.newaxis])), dtype )
Now we can calculate the SWI. Here we calculate it for depth intervals of 5, 50, and 100 cm.
= np.array([5, 50, 100], dtype=np.int32)
t_value = calc_swi_ts(ssm_ts, swi_jd, t_value=t_value) swi_ts, gain_out
After which, we convert the numpy
arrays back to a pandas DataFrame
and merge it with the original SSM data. This requires the Julian time steps to be converted back to a normal pandas datetime
type.
= pd.DataFrame(swi_ts)
swi_ts "time"] = pd.to_datetime(swi_ts["swi_jd"] - 2440587.5, unit="D")
swi_ts[= pd.merge_asof(sm_ts, swi_ts.set_index("time"), left_index=True, right_index=True) ts
Finally, we plot the results where see the lagged and dampened response of soil moisture content with increasing depth.
ts.hvplot.line(=["surface_soil_moisture", "swi_5", "swi_50", "swi_100"], frame_width=800
y )
W. Wagner, Lemoine, G., Rott, H., 1999, A Method for Estimating Soil Moisture from ERS Scatterometer and Soil Data, Remote Sensing of Environment, Volume 70-2, pp. 191-207, doi: S0034-4257(99)00036-X↩︎
Julian time stamps, also known as Julian dates, are continuous counts of days and fractions of days since a specific starting point, commonly used in astronomy and scientific applications to measure time intervals precisely.↩︎