import folium
import holoviews as hv
import hvplot.xarray # noqa
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
In this notebook, we will show the concept of extracting coefficients that describe seasonal patterns in Sentinel 1 radar backscatter variability. Namely, sine and cosine functions as harmonic oscillators are used to describe periodicities in the time series of, either VV or VH polarisations, backscatter. Those can then be removed from time series and what is left would generally be the noise or transient events, for example floods, volcano erruptions, and whatever is possible to detect with radar Earth Observation data.
C.1 Prerequisites
Concepts | Importance | Notes |
---|---|---|
Intro to xarray | Necessary | |
Intro to Flood mapping | Necessary | |
Documentation hvPlot | Helpful | Interactive plotting |
- Time to learn: 10 min
C.2 Imports
Loading sigma nought time series.
= xr.open_dataset(
timeseries_dc "simplecache::zip:///::https://huggingface.co/datasets/martinschobben/tutorials/resolve/main/harmonic-parameters.zarr.zip", # noqa
="zarr",
engine={
storage_options"simplecache": {"cache_storage": "/tmp/fsspec_cache/harmonic-parameters"}
},
) timeseries_dc
<xarray.Dataset> Size: 29kB Dimensions: (point: 2, time: 1214) Coordinates: * point (point) <U4 32B 'land' 'lake' spatial_ref int32 4B ... * time (time) datetime64[ns] 10kB 2015-01-05T17:03:45 ... 2022-12-3... x (point) float64 16B ... y (point) float64 16B ... Data variables: VH (point, time) float32 10kB ... VV (point, time) float32 10kB ...
The data that is loaded represents VV and VH backsatter polarisations, as detected by Sentinel-1 radar instrument. The two points of interest are on Sicily, nearby Lentini and Catania.
= 37.283606, 37.40621527385254
latmin, latmax = 14.826223, 15.109736519516783
lonmin, lonmax
= [
bounding_box
[latmin, lonmin],
[latmax, lonmax],
]
map = folium.Map(
=[
location+ latmax) / 2,
(latmin + lonmax) / 2,
(lonmin
],=9,
zoom_start=True,
zoom_control=False,
scrollWheelZoom=True,
dragging
)
folium.Rectangle(=bounding_box,
bounds="red",
colormap)
).add_to(
folium.Marker(=[37.37489461337563, 14.884886613876311],
location="Selected Pixel in the flooded land in 2018",
popup=folium.Icon(color="red"),
iconmap)
).add_to(
folium.Marker(=[37.32275297904196, 14.947068995810364],
location="Selected Pixel in lake Lentini",
popup=folium.Icon(color="red"),
iconmap)
).add_to(
map
Let’s plot time series of those two points.
= pd.to_datetime("2018-05-17")
event_date
= timeseries_dc.sel(point="lake").VV.hvplot(
lake_curve ="Lake Lentini VV",
label=800,
width=300,
height="navy",
color="Sigma0 VV (dB)",
ylabel="Time",
xlabel="Lake Lentini Pixel",
title
)
= timeseries_dc.sel(point="land").VV.hvplot(
land_curve ="Land Pixel VV",
label=800,
width=300,
height="olive",
color="Sigma0 VV (dB)",
ylabel="Time",
xlabel="Flooded Land Pixel",
title
)
= hv.VLine(event_date).opts(color="red", line_dash="dashed", line_width=2)
event_line
= lake_curve * event_line
lake_plot = land_curve * event_line
land_plot
+ land_plot).cols(1) (lake_plot
C.3 The Concept of Harmonic Parameters
C.3.1 One Harmonic in Traditional Form
A single harmonic is an oscillatory function, which can be expressed as:
\[ f(t) = A \cos \left( \frac{2\pi}{n} t + \phi \right) \]
where: - $ A $ is the amplitude of the harmonic, - $ $ is the phase shift in radians, - $ n $ is the period in units of time, - $ 2/n $ angular frequency.
The amplitude here can represent a physical quantitiy of interest, for instance temperature, radar backscatter, soil moisture, etc. In a way, anything can be represented as signal and signal processing can be therefore applied to many different scientific fields.
# A simple harmonic oscillator
= np.linspace(0, 300, 2000)
time = 1.0
amplitude = 20.0
period = 0.0
phi = amplitude * np.cos((2 * np.pi / period) * time + phi)
y
"Time", "Amplitude").opts(
hv.Curve((time, y), ="Simple Cosine Plot", width=1000, height=600, line_width=2
title )
Now, if we have measure a physical quantity over a long time period, for example temperature of some region, we have a time series. A harmonic regression is a least-square-fit of a harmonic function to the complex signal - or time series. In regard to microwave backscattering time series this property can be utilized to represent seasonal patterns caused by vegetation. This way we can filter out the vegetation signal from our microwave backscattering time series - to either better understand the physics behind this harmonic or to better detect events that don’t seasonally repeat, like flood events.
Harmonic parameters would be input parameters that define such fitted harmonic components, in this case: amplitude, shifting phase and period of an oscillating function. However, the period and starting phase are inside the non-linear (sinusoidal) function, so a linearisation has to be done, as those parameters are going to be estimated with a least-square regression algorithm. In our case we will only estimate phase shift and amplitude, not the period of the harmonics.
Using the angle sum identity:
\[ \cos(x + y) = \cos x \cos y - \sin x \sin y \]
we expand:
\[ A \cos \left( \frac{2\pi t}{n} + \phi \right) = A \left[ \cos \phi \cos \left( \frac{2\pi t}{n} \right) - \sin \phi \sin \left( \frac{2\pi t}{n} \right) \right] \]
C.3.1.1 Defining Coefficients $ c_i $ and $ s_i $
Now, we can define the coefficients, that have units of a physical quantity (amplitude, such as radar backscatter cross sections):
\[ c = A \cos \phi, \quad s = - A \sin \phi \]
so that the equation becomes:
\[ A \cos \left( \frac{2\pi t}{n} + \phi \right) = c \cdot \cos \left( \frac{2\pi t}{n} \right) + s \cdot \sin \left( \frac{2\pi t}{n} \right) \]
We can then extract the starting phase information outside of the sinusoidal function. The period information is still there, but only because in this case it is not estimated in least-square process, but predetermined.
C.3.1.2 Generalizing to $ k $ Harmonics
A complex signal is generally summation of many basic harmonic terms. Summing over all harmonics, we obtain:
\[ f(t) = f^0 + \sum_{i=1}^{k} \left[ c_i \cos \left( \frac{2\pi i t}{n} \right) + s_i \sin \left( \frac{2\pi i t}{n} \right) \right] \]
where:
- $ f^0 $ is the mean function value,
- $ c_i = A_i _i $ and $ s_i = - A_i _i $ are the harmonic coefficients.
In this form different periodicities are covered, for example with $ i = 1, 2, … k $, we can have periods of $ , $, and so on.
# Simulation of complex signal with many harmonics
= np.linspace(0, 15, 1000)
t = 3
k
= [
coefficients "A": 3, "B": 2, "n": 2, "phi": 0},
{"A": 1.5, "B": 0.5, "n": 5, "phi": np.pi / 4},
{"A": 0.8, "B": 1.2, "n": 8, "phi": np.pi / 2},
{
]
= ["firebrick", "navy", "seagreen"]
colors
= []
harmonics = np.zeros_like(t)
signal_sum
for coeff in coefficients:
= coeff["A"] * np.cos(
harmonic 2 * np.pi * coeff["n"] * t) / 10 + coeff["phi"]
(+ coeff["B"] * np.sin((2 * np.pi * coeff["n"] * t) / 10 + coeff["phi"])
)
harmonics.append(harmonic)+= harmonic
signal_sum
= max(np.max(np.abs(h)) for h in harmonics + [signal_sum])
max_amp
= plt.subplots(k + 1, 1, figsize=(6, 8), sharex=True)
fig, axes
for i in range(k):
=f"Harmonic {i + 1}", color=colors[i])
axes[i].plot(t, harmonics[i], label
axes[i].legend()
axes[i].grid()-max_amp, max_amp)
axes[i].set_ylim(
axes[k].plot(
t,
signal_sum,="Summed Signal = Harmonic 1 + Harmonic 2 + Harmonic 3",
label="black",
color
)
axes[k].legend()
axes[k].grid()"Time")
axes[k].set_xlabel(-max_amp, max_amp)
axes[k].set_ylim(
plt.tight_layout() plt.show()
C.3.2 Recovering Amplitude and Phase
Now that we have estimated the harmonic coefficients $ c_i \(** and **\) s_i $ wit the harmonic regression, we can recover the original amplitude and phase for a physical interpretation, as:
\[ A_i = \sqrt{c_i^2 + s_i^2} \]
\[ \phi_i = \tan^{-1} \left( -\frac{s_i}{c_i} \right) \]
C.3.3 Harmonic Model Equation for Radar Backscatter in Flood Detection Algorithm
The harmonic model function is given by:
\[ \widehat{\sigma}^0 (t_{doy}) = \sigma^0 + \sum_{i=1}^{k} \left\{ c_i \cos \left( \frac{2\pi i }{n} t_{doy} \right) + s_i \sin \left( \frac{2\pi i}{n} t_{doy} \right) \right\} \]
where:
\[ \sigma^0 &\quad \text{is the effective mean radar backscatter,} \\ \widehat{\sigma}^0 (t_{doy}) &\quad \text{is the estimated radar backscatter at time } t, \\ t_{doy} &\quad \text{is the time instance (as a day of a year),} \\ n &= 365 \text{ days} \\ c_i, s_i &\quad \text{are harmonic coefficients for } i = 1, 2, ..., k, \\ k &\quad \text{is the number of harmonic iterations}. \]
Let’s define a function that will fit a model like this with a least squares method, on a xarray
array. Of course, the initial harmonic parameters first need to be estimated or known and their number depends on \(k\).
def build_initial_parameters(array, k):
"""
Constructs initial parameters and their names for harmonic curve fitting
with option to choose number of k harmonics. Needed for
xarray.DataArray.curvefit
Parameters
----------
array : xarray.DataArray
The input 1D time series data for which the harmonic model is being
fitted.
k : int
Number of harmonics to include in the model. For each harmonic, two
parameters
(cosine and sine coefficients) will be added: 'c1', 's1', ..., 'ck',
'sk'.
Returns
-------
param_names : list of str
A list of parameter names in the order expected by the harmonic model
function.
Format: ['mean', 'c1', 's1', ..., 'ck', 'sk'].
p0 : dict
A dictionary containing initial guesses for each parameter.
The mean is initialized from the data, and all harmonic coefficients are
set to 1.0.
"""
= float(array.mean().values)
mean_val
= ["mean"]
param_names for i in range(1, k + 1):
+= [f"c{i}", f"s{i}"]
param_names
= {"mean": mean_val}
p0 for name in param_names[1:]:
= 1.0
p0[name]
return param_names, p0
def harmonic_model(t, mean, *coef):
"""
Harmonic model function for fitting periodic components in time series data.
To be passed in xarray.DataArray.curvefit as func argument
This function computes a sum of sine and cosine terms up to a specified
number of harmonics. The number of harmonics k is inferred from the length
of the coef argument (must be 2 * k). The time variable t is expected to
be in nanoseconds, e.g., from datetime64[ns] converted to int.
Parameters
----------
t : array-like or float
Time values (in nanoseconds) over which to evaluate the harmonic model.
This should match the time coordinate used in the original dataset,
converted to integers via .astype('int64').
mean : float
The mean (baseline) value of the signal to which the harmonic components
are added.
*coef : float
Variable-length list of harmonic coefficients, ordered as:
[c1, s1, c2, s2, ..., ck, sk], where k = len(coef) // 2.
Each `ci` and `si` corresponds to the cosine and sine coefficients for
the i-th harmonic.
Returns
-------
result : array-like or float
The computed harmonic model values corresponding to the input t.
Notes
-----
The fundamental frequency is assumed to be one cycle per year. The time
normalization is based on the number of nanoseconds in a year (365 * 24 * 60
* 60 * 1e9).
"""
= 365
n = mean
result
= len(coef) // 2 # Number of harmonics
k
for i in range(1, k + 1):
= coef[2 * (i - 1)]
c_i = coef[2 * (i - 1) + 1]
s_i += c_i * np.cos(2 * np.pi * i * t / n) + s_i * np.sin(
result 2 * np.pi * i * t / n
)
return result
C.3.4 Harmonic Function Fitting
Now, the two time series can be selected and the coefficients can be estimated. We pick the VV polarisation for a land pixel and 3 harmonics (7 parameters).
= timeseries_dc.sel(point="land").VH
land_VV_series = build_initial_parameters(land_VV_series, k=3)
param_names, p0
= land_VV_series.curvefit(
fit_result ="time.dayofyear",
coords=harmonic_model,
func=param_names,
param_names="time",
reduce_dims
)
fit_result
<xarray.Dataset> Size: 820B Dimensions: (param: 7, cov_i: 7, cov_j: 7) Coordinates: point <U4 16B 'land' spatial_ref int32 4B 27704 x float64 8B 5.022e+06 y float64 8B 4.333e+05 * param (param) <U4 112B 'mean' 'c1' 's1' 'c2' 's2' 'c3' 's3' * cov_i (cov_i) <U4 112B 'mean' 'c1' 's1' 'c2' 's2' 'c3' 's3' * cov_j (cov_j) <U4 112B 'mean' 'c1' 's1' 'c2' 's2' 'c3' 's3' Data variables: curvefit_coefficients (param) float64 56B -21.87 0.9572 ... 0.2654 0.1653 curvefit_covariance (cov_i, cov_j) float64 392B 0.005444 ... 0.0108
Let’s extract and print estimated harmonic parameters for this pixel
= fit_result.curvefit_coefficients.values
estimated_params
for name, val in zip(fit_result.param.values, estimated_params):
print(f"{name: >6}: {val: .4f}")
mean: -21.8654
c1: 0.9572
s1: 1.2286
c2: -0.6606
s2: -1.4352
c3: 0.2654
s3: 0.1653
Now, these coefficient can be used to construct a total harmonic signal.
# Extract estimated harmonic parameters and reconstruct a signal as xarray dataaray
= estimated_params[0]
mean = estimated_params[1:]
coeffs
= harmonic_model(timeseries_dc["time.dayofyear"], mean, *coeffs)
fitted_vals
= xr.DataArray(
fitted_da ={"time": land_VV_series.time}, dims="time", name="Harmonic Fit"
fitted_vals, coords
)
# Plot the data
= land_VV_series.hvplot(
plot ="Original", color="forestgreen", alpha=1
label* fitted_da.hvplot(label="Harmonic Fit", color="darkorange", line_width=2.5)
)
plot.opts(="Harmonic Model Fit to VV Timeseries over Land",
title="Time",
xlabel="VV backscatter",
ylabel=900,
width=400,
height )
We can see cycles that happen once a year, twice a year and three times a year, so it makes sense to work with time series over several years. Important parameter is a number of observations (NOBS) - the more observations the better. It directly relates to the uncertainty of our estimate. In this respect, the standard deviation is an important parameter, as it tells us how well our harmonic function fits the observations. It does not, however, tell us the uncertainty of the observations, just how well they align with seasonal patterns defined by the model.
= land_VV_series - fitted_da
residuals_land = np.sum(residuals_land.dropna(dim="time").values ** 2)
sse
= residuals_land.size
nobs = nobs - (2 * k + 1)
dof
= np.sqrt(sse / dof)
stdev
print(f"Number of observations (NOBS): {nobs}")
print(f"Estimated standard deviation of the fit: {stdev: .4f}")
Number of observations (NOBS): 1214
Estimated standard deviation of the fit: 2.4799
Lets plot residuals that were used to calculate standard deviation and see the possible outliers that do not fit those seasonal patterns.
= residuals_land.dropna(dim="time")
land_residuals_ts = "Residual"
land_residuals_ts.name
land_residuals_ts.hvplot(="Residuals",
label="firebrick",
color=2,
line_width="Residuals of Harmonic Fit (Land Pixel)",
title="Time",
xlabel="Residual (VV)",
ylabel=900,
width=300,
height* event_line )
C.3.4.1 Lake Lentini Example
Lets see how time summed harmonic signal looks like for a lake pixel, where backscatter is more stable. Therefore, vegetation periodicities should be less pronounced over water.
= timeseries_dc.sel(point="lake").VV
lake_VV_series = build_initial_parameters(lake_VV_series, k=3)
param_names_lake, p0_lake
= lake_VV_series.curvefit(
fit_result ="time", func=harmonic_model, p0=p0_lake, param_names=param_names_lake
coords
)
= fit_result.curvefit_coefficients.values
estimated_params_lake
for name, val in zip(fit_result.param.values, estimated_params_lake):
print(f"{name: >6}: {val: .4f}")
mean: -20.4635
c1: -0.1479
s1: 0.1429
c2: 0.0382
s2: -0.2207
c3: -0.2902
s3: 0.0870
= estimated_params_lake[0]
mean = estimated_params_lake[1:]
coeffs
= harmonic_model(timeseries_dc["time.dayofyear"], mean, *coeffs)
fitted_vals
= xr.DataArray(
fitted_da ={"time": lake_VV_series.time}, dims="time", name="Harmonic Fit"
fitted_vals, coords
)
= lake_VV_series.hvplot(
plot ="Original", color="navy", alpha=0.75
label* fitted_da.hvplot(label="Harmonic Fit", color="darkorange", line_width=2.5)
)
plot.opts(="Harmonic Model Fit to VV Timeseries of a pixel inside lake Lentini",
title="Time",
xlabel="VV backscatter",
ylabel=900,
width=400,
height )
= lake_VV_series - fitted_da
residuals_lake = np.sum(residuals_lake.dropna(dim="time").values ** 2)
sse
= residuals_lake.size
nobs = nobs - (2 * k + 1)
dof
= np.sqrt(sse / dof)
stdev
print(f"Number of observations (NOBS): {nobs}")
print(f"Estimated standard deviation of the fit: {stdev: .4f}")
Number of observations (NOBS): 1214
Estimated standard deviation of the fit: 3.2212
= residuals_lake.dropna(dim="time")
lake_residuals_ts = "Residual"
lake_residuals_ts.name
lake_residuals_ts.hvplot(="Residuals",
label="firebrick",
color=2,
line_width="Residuals of Harmonic Fit (Lake Pixel)",
title="Time",
xlabel="Residual (VV)",
ylabel=900,
width=300,
height )
As one can notice, general pattern is a more stochastic signal. One can argue that setting k = 3
actually introduces artifacts, as the original signal was not periodic in the first place.
C.3.5 Overfitting Problem - Choosing \(k\) iterations
Parameter \(k\) that governs the number of harmonic terms, is usually two or three. Higher order terms would lead to overfitting. A flood event in time series would be an impulse (jump in backscatter value) that would propagate as an artefact if higher order harmonics are fitted to years-long time series. Higher order terms would usually have low amplitude, an estimation of those would highly depend on noise level in signal. Therefore, those harmonics would not be of a physical nature, or in other words, they wouldn’t represent seasonal vegetation cycles.
= [1, 2, 3, 10]
ks = []
fitted_das
for k in ks:
= build_initial_parameters(land_VV_series, k)
param_names, p0
= land_VV_series.curvefit(
fit_result ="time.dayofyear",
coords=harmonic_model,
func=param_names,
param_names="time",
reduce_dims
)
= fit_result.curvefit_coefficients.values
estimated_params = estimated_params[0]
mean = estimated_params[1:]
coeffs = harmonic_model(timeseries_dc["time.dayofyear"], mean, *coeffs)
fitted_vals
= xr.DataArray(
fitted_da
fitted_vals,={"time": land_VV_series.time},
coords="time",
dims=f"Harmonic Fit (k={k})",
name
)
fitted_das.append(fitted_da)
= land_VV_series.hvplot(label="Original", color="black", alpha=0.6)
plot
= ["red", "orange", "green", "blue"]
colors
for da, k_val, color in zip(fitted_das, ks, colors):
*= da.hvplot(label=f"k = {k_val}", line_width=2, color=color)
plot
plot.opts(="Harmonic Fits of VV Timeseries for Multiple k Values",
title="Time",
xlabel="VV Backscatter",
ylabel=900,
width=400,
height="top_left",
legend_position )