10 Remotely Sensed Droughts in Mozambique – Earth Observation Datascience

Drought Monitoring with H SAF ASCAT Surface Soil Moisture Retrievals

The Advanced SCATterometer (ASCAT) (Source: ESA)

10.1 Overview

In this notebook we will examine the capabilities of the H SAF Advanced Scatterometer (ASCAT) to monitor droughts in Mozambique. ASCAT instruments are situated onboard the Metop satellites (EUMETSAT¹) that are in orbit since 2007. These operational meteorological missions have yielded a continuous record of microwave backscattering and continue to produce data for the future. The longevity of the ASCAT microwave backscatter record is therefore well-suited to track climate change, such as soil moisture drying trends and droughts over Mozambique. The surface soil moisture (SSM) retrieved from the product showcased here is available at a sampling distance of 6.25\(\,\)km, this means that one value of soil moisture is available for every 50\(\,\)km\(^2\) (5000\(\,\)ha).

More information about microwave backscattering and the fundamentals of surface soil moisture retrieval from microwave backscatter signatures can be found in this video:

10.2 Imports

import cartopy.crs as ccrs
import holoviews as hv
import hvplot.pandas  # noqa
import numpy as np
import pandas as pd
from bokeh.models import FixedTicker

from envrs.download_path import make_url
from envrs.ssm_cmap import SSM_CMAP

ERROR 1: PROJ: proj_create_from_database: Open of /home/runner/work/eo-datascience/eo-datascience/.conda_envs/environmental-remote-sensing/share/proj failed

10.3 Plotting of Spatial Soil Moisture Maps

Let us start by having a look at monthly aggregated SSM derived from ASCAT microwave backscattering over Mozambique. We can easily load the csv-file with pandas from the web, like so:

url = make_url("ascat-6_25_ssm_monthly.csv")
df = pd.read_csv(
    url,
    index_col=["time", "location_id"],
    parse_dates=["time"],
)

df.head()

https://git.geo.tuwien.ac.at/api/v4/projects/1266/repository/files/ascat-6_25_ssm_monthly.csv/raw?ref=main&lfs=true

		longitude	latitude	surface_soil_moisture	zscore
time	location_id
2007-01-01	10250165	32.674310	-26.702084	22.886970	-0.399316
	10243400	32.698105	-26.768017	28.497500	-0.141156
	10246361	32.208755	-26.739155	37.186670	0.151953
	10250542	32.247260	-26.698412	41.724550	0.136301
	10278822	32.679924	-26.423197	21.824375	-1.098983

The data is presented in a long format, where each combination of time and location-id represents a unique observation. The coordinates, latitude (vertical position) and longitude (horizontal position), indicate the location on the Earth’s surface. These geospatial points are one of the simplest types of geospatial data known as the Climate and Forecast (CF) Convention “Point Data”². This format is well-suited for timeseries which is important for detecting droughts.

We will plot the results using hvplot.points. Hvplot will manage the scattered locations by employing the rasterize parameter, which resamples the data onto a selected grid representation: an Equirectangular projection (a.k.a ‘plate carrée projection’) in this instance. By grouping the data by "time", we create an interactive plot that allows us to scroll through all the months from 2007 to the present. Additionally, we will overlay the SSM values onto an Open Street Map (OSM). For convenience, we have included the locations of the in-situ sensors placed in each target district of the DrySAT project.

SSM values are reported as the degree of saturation, indicating how much of the soil pore space is filled with water. This means, values can range from 0%, completely dry soil, to 100%, fully saturated soil. If you would like to obtain absolute values of soil moisture, i.e. how much water is available in the soil in m\(^3\)/m\(^3\) you can use the soil porosity of your location and multiply it with the degree of saturation. This is further explained in the following notebook.

locations = {
    "Muanza": {"latitude": -18.9064758, "longitude": 34.7738921},
    "Chokwé": {"latitude": -24.5894393, "longitude": 33.0262595},
    "Mabote": {"latitude": -22.0530427, "longitude": 34.1227842},
    "Mabalane": {"latitude": -23.4258788, "longitude": 32.5448211},
    "Buzi": {"latitude": -19.9747305, "longitude": 34.1391065},
}

df_locs = pd.DataFrame.from_dict(locations, "index").reset_index()

points = df_locs.hvplot.points(
    x="longitude", y="latitude", color="black", crs=ccrs.PlateCarree()
)
labels = df_locs.hvplot.labels(
    x="longitude",
    y="latitude",
    text="index",
    text_baseline="bottom",
    text_color="black",
    crs=ccrs.PlateCarree(),
)

df.hvplot.points(
    x="longitude",
    y="latitude",
    c="surface_soil_moisture",
    groupby="time",
    x_sampling=0.08,
    y_sampling=0.08,
    rasterize=True,
    crs=ccrs.PlateCarree(),
    tiles=True,
    cmap=SSM_CMAP,
    clim=(0, 100),
    frame_width=500,
    clabel="Surface soil moisture (%)",
) * points * labels

10.4 Plotting of Surface Soil Moisture Timeseries

Now let us have a closer look at the five locations marked on the SSM map and plot the SSM values against time for these 5 locations—known as timeseries. To do this we have already filtered down the full dataset to only contain the five locations for you. We read these 5 timeseries using pandas and importing the .csv file. The .csv file contains 5 timeseries of SSM for the locations Buzi, Mabalane, Muanza, Mabote, and Chokwé. This filtered dataset shows the full temporal resolution of the product. To visualize this, we highlight the density of data points falling in a certain sector of the plot with blue shading—bluer values mark a higher density of data points.

ts = pd.read_csv(
    make_url("ascat-6_25_ssm_timeseries.csv"),
    index_col="time",
    parse_dates=True,
)

ts.hvplot.scatter(
    x="time",
    y="surface_soil_moisture",
    groupby="name",
    rasterize=True,
    dynspread=True,
    threshold=1,
    frame_width=800,
    padding=(0.01, 0.1),
    clabel="Density of data",
)

https://git.geo.tuwien.ac.at/api/v4/projects/1266/repository/files/ascat-6_25_ssm_timeseries.csv/raw?ref=main&lfs=true

The cyclical seasonal pattern from dry to wet can be easily discerned from the timeseries. Note, however, again that we do not track precipitation, but the change from wet to dry soils. Moreover, we can see that the cyclical pattern breaks down on occasion as can be seen in the years 2015 and 2016. Especially Chokwé displays a complete lack of wet soils during the 2016 rainy season. We can remove some of the noise in the records by aggregating the values on a monthly basis, as can be seen in the following code chunk. Here, the pandas dataframe method groupby can group the timeseries for all successive months with the pandas function Grouper(freq="ME") and the location name. We can then plot the data monthly and color code per location name as follows:

ts_monthly = (
    ts.groupby([pd.Grouper(freq="ME"), "name"])
    .surface_soil_moisture.mean()
    .reset_index(level="name")
)

ts_monthly.hvplot.line(
    x="time",
    y="surface_soil_moisture",
    by="name",
    frame_width=800,
    padding=(0.01, 0.1),
)

In these monthly aggregated timeseries we can more easily investigate temporal dynamics per location. Note that we are still looking at the degree of saturation, and that for each location it varies between 0 and 100%. So, these time series do not give us information on the absolute differences in SSM between locations. To do this we would need information on the soil porosity for each site. However, we can use the timeseries at each location to look at temporal dynamics such as changes in the the amplitude, or the magnitude of change in SSM. This is of greater importance for drought detection, as we can see if a change in SSM during a specific time is “normal”, or more “unusual” for this specific location, when compared to other years.

10.5 Normalization and Anomaly Detection

To investigate how “unusual” certain periods are, we can calculate Z-score statistics.

\[ z_{k,i} = \frac{\text{SM}_{k,i} - \bar{\text{SM}}_i}{s^{\bar{\text{SM}}_i}} \]

where:

\[ \begin{align} \text{SM}_{k,i} &\quad \text{:soil moisture for specific period (e.g., month) (} \; i \; \text{) and year (} \; k \; \text{)}\\ \bar{\text{SM}}_i &\quad \text{: long-term mean soil moisture for specific period (e.g., month) (} \; i \; \text{)} \\ s^{\bar{\text{SM}}_i} &\quad \text{: long-term standard deviation soil moisture for specific period (e.g., month) (} \; i \; \text{)} \end{align} \]

The Z score statistic is an approach to detect anomalies in timeseries, where one measures how far a datapoint (\(\text{SM}_{k,i}\)) is removed from the long-term mean (\(\bar{\text{SM}}\)). This distance from the mean by itself is not all that useful, as it depends on the location’s average SSM. To circumvent, and to more easily compare timeseries of different locations, we divide the distance of the mean with a measure of variation of the timeseries, such as the standard deviation (\(s^{\bar{\text{SM}}_i}\)).

def zscore(x: pd.Series) -> pd.Series:
    """Z Score.

    Parameters
    ----------
    x : pd.Series
        Monthly aggregated surface soil moisture values


    Returns
    -------
    Z scores : pd.Series

    """
    return (x - x.mean()) / x.std()

We exemplify this normalization step below. Here we can see two histograms for a simulated SSM dataset. The histogram on the top is still in the original “degree of saturation” units, whereas the graph on the bottom is transformed to Z scores. The value of the x axis of the lower histogram can be translated as: “This point is so many standard deviations removed from the mean.”

rng = np.random.default_rng()  # make reproducible
mu, sigma = 50, 10  # mean and standard deviation
random_ts = pd.Series(rng.normal(mu, sigma, 100))
(
    random_ts.hvplot.hist(xlabel="SSM (%)")
    + zscore(random_ts).hvplot.hist(xlabel="Z score")
).cols(1).opts(shared_axes=False)

10.6 Drought Anomaly Detection

For our time series, we will now calculate the Z-scores per month. First, we must calculate the average for all months of January and compare this to the monthly aggregated SSM value, then for February, and so on for every month. This operation is like the previous groupby operation, but now we use the datetime accessor month to accumulate monthly averages. Then we use the transform to calculate the Z score on the panda’s column with our previously defined function zscore.

ts_monthly["zscore"] = ts_monthly.groupby(
    [ts_monthly.index.month, "name"]
).surface_soil_moisture.transform(zscore)
ts_monthly.hvplot.line(
    x="time",
    y="zscore",
    by="name",
    frame_width=800,
    padding=(0.01, 0.1),
)

In the last plot we can now clearly discern the drought of 2015/2016 but also other droughts, such as during the years 2019/2020. The Z score also appears to indicate drier than usual conditions in 2024/2025. The Z-score can be translated into drought severity levels where the following drought conditions can be classified: “mild”, “moderate”, “severe”, “extreme”. We will talk more about the drought severity levels in notebook 3.

10.7 Monitoring Drought in Time and Space

As a last step, we can now apply this approach to the whole of Mozambique. Here we have already calculated the Z scores and we just have to plot them.

colorbar_opts = {
    "major_label_overrides": {
        -2.5: "Extreme",
        -2: "Severe",
        -1.5: "Moderate",
        -1: "Mild",
        0: "Normal",
    },
    "ticker": FixedTicker(ticks=[-2.5, -2, -1.5, -1, 0]),
}

df[df.zscore <= 0].hvplot.points(
    x="longitude",
    y="latitude",
    c="zscore",
    groupby="time",
    x_sampling=0.08,
    y_sampling=0.08,
    rasterize=True,
    crs=ccrs.PlateCarree(),
    tiles=True,
    cmap="reds_r",
    clim=(-3, 0),
    frame_width=500,
    clabel="Drought anomaly",
).opts(hv.opts.Image(colorbar_opts={**colorbar_opts})) * points * labels

This temporospatial analysis (in time and space) confirms that 2015/2016 was particularly pronounced in the south of the country surrounding the region of Chokwé. But this intense drought was also prevalent in the northern districts neighbouring Malawi. This is something that would not be seen in a spot-wise analysis.

In the next notebooks, we will compare this microwave-based technique to other indicators of drought such as SPEI and vegetation-based indicators of drought (NDVI).

10.8 Appendix: Fibonacci Grid

The H SAF ASCAT SSM data is presented on an irregular grid known as the Fibonacci grid (González, 2010)³. This grid offers advantages for e.g., statistical calculations on the data by reducing errors caused by distortions and non-uniform weighting of point contributions. For instance, when comparing areas over a wide latitudinal range, a regular longitude-latitude grid can lead to unequal weighting, as data points become more densely packed farther from the equator (see table). However, working with irregular grids can be more challenging. We will provide you with some useful techniques to simplify your work.

Feature	Regular Grids	Irregular Grids
Structure	Uniform spacing and cell size	Variable spacing and cell size
Data Distribution	Evenly distributed data points	Densely placed data points in areas of interest
Computational Efficiency	Highly efficient for processing and visualization	More complex and time-consuming computations
Benefits	- Simplicity - Computational efficiency - Ease of implementation	- Detailed representation - Flexibility - Accurate spatial heterogeneity
Downsides	- May lack detail in complex areas - Uneven sampling weights in statistics	- Increased computational complexity - More challenging to implement

European Organisation for the Exploitation of Meteorological Satellites ↩︎
The Climate and Forecast (CF) Conventions are standards for NetCDF files that define metadata, including variable names, units, and other attributes, to ensure interoperability and consistency in climate and forecast data handling.↩︎
Á. González, 2010, Measurement of Areas on a Sphere Using Fibonacci and Latitude-Longitude Lattices, Math Geosci, vol. 42, no. 1, pp. 49-64, doi: 10.1007/s11004-009-9257-x.↩︎