Appendix D — Homework Exercise 4: Datacubes

For the homework assignement we will continue work on the same data as in the in-class exercise. Hence we will load the same data which has been stored as a Zarr datastore. Zarr is efficient for storing chunked data and much faster for reading.

import xarray as xr
import rioxarray  # noqa
from pathlib import Path

import hvplot.xarray  # noqa
import intake

from mrs.catalog import get_intake_url
url = get_intake_url()
cat = intake.open_catalog(url)
fused_ds = cat["fused_array"].read().compute()
fused_ds
https://git.geo.tuwien.ac.at/public_projects/microwave-remote-sensing/-/raw/dev-exs/microwave-remote-sensing.yml
<xarray.Dataset> Size: 307MB
Dimensions:      (sensor: 4, time: 5, y: 1528, x: 2508)
Coordinates:
  * sensor       (sensor) object 32B 's1_VH' 's1_VV' 'alos_HH' 'alos_HV'
  * time         (time) datetime64[ns] 40B 2022-06-30 2022-07-31 ... 2022-10-31
  * y            (y) float64 12kB 1.397e+06 1.397e+06 ... 1.382e+06 1.382e+06
  * x            (x) float64 20kB 4.769e+06 4.769e+06 ... 4.794e+06 4.794e+06
    spatial_ref  int64 8B 0
Data variables:
    gam0         (sensor, time, y, x) float32 307MB -14.52 -14.8 ... -31.16

We want to expand the datacube of the in-class exercise with a new variable in this assignment. The new variable is the Leaf Area Index (LAI), which is a dimensionless index measuring the one-sided green leaf area over a unit of land (\(m^2 \cdot m^{-2}\)).

D.1 Question 1

Load the new LAI data with the below provided code snippet and extract the CRS and resolution of the raster. Apply what you have learned in the in-class exercise by only using the packages as listed in the imports of this notebook.

url = get_intake_url()
cat = intake.open_catalog(url)
lai_ds = cat["lai"].read().compute()
lai_ds
https://git.geo.tuwien.ac.at/public_projects/microwave-remote-sensing/-/raw/dev-exs/microwave-remote-sensing.yml
<xarray.Dataset> Size: 32MB
Dimensions:  (time: 35, lat: 337, lon: 337)
Coordinates:
  * time     (time) datetime64[ns] 280B 2022-01-20 2022-01-31 ... 2022-12-31
  * lat      (lat) float64 3kB 46.0 46.0 45.99 45.99 ... 45.01 45.01 45.0 45.0
  * lon      (lon) float64 3kB 10.0 10.0 10.01 10.01 ... 10.99 10.99 11.0 11.0
    crs      int64 8B 0
Data variables:
    LAI      (time, lat, lon) float64 32MB nan nan nan nan ... nan nan nan nan

D.2 Question 2

In order to compare LAI with ALOS-2 L-band and Sentinel-1 C-band data, we will merge this variable with the SAR datacube (of the in-class exercise). Let’s first check the temporal range of the SAR datacube (fused_ds) and the new xarray dataset: lai_ds.

fused_ds.time
<xarray.DataArray 'time' (time: 5)> Size: 40B
array(['2022-06-30T00:00:00.000000000', '2022-07-31T00:00:00.000000000',
       '2022-08-31T00:00:00.000000000', '2022-09-30T00:00:00.000000000',
       '2022-10-31T00:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
  * time         (time) datetime64[ns] 40B 2022-06-30 2022-07-31 ... 2022-10-31
    spatial_ref  int64 8B 0
lai_ds.time
<xarray.DataArray 'time' (time: 35)> Size: 280B
array(['2022-01-20T00:00:00.000000000', '2022-01-31T00:00:00.000000000',
       '2022-02-10T00:00:00.000000000', '2022-02-20T00:00:00.000000000',
       '2022-02-28T00:00:00.000000000', '2022-03-10T00:00:00.000000000',
       '2022-03-20T00:00:00.000000000', '2022-03-31T00:00:00.000000000',
       '2022-04-10T00:00:00.000000000', '2022-04-20T00:00:00.000000000',
       '2022-04-30T00:00:00.000000000', '2022-05-10T00:00:00.000000000',
       '2022-05-20T00:00:00.000000000', '2022-05-31T00:00:00.000000000',
       '2022-06-10T00:00:00.000000000', '2022-06-20T00:00:00.000000000',
       '2022-06-30T00:00:00.000000000', '2022-07-10T00:00:00.000000000',
       '2022-07-20T00:00:00.000000000', '2022-07-31T00:00:00.000000000',
       '2022-08-10T00:00:00.000000000', '2022-08-20T00:00:00.000000000',
       '2022-08-31T00:00:00.000000000', '2022-09-10T00:00:00.000000000',
       '2022-09-20T00:00:00.000000000', '2022-09-30T00:00:00.000000000',
       '2022-10-10T00:00:00.000000000', '2022-10-20T00:00:00.000000000',
       '2022-10-31T00:00:00.000000000', '2022-11-10T00:00:00.000000000',
       '2022-11-20T00:00:00.000000000', '2022-11-30T00:00:00.000000000',
       '2022-12-10T00:00:00.000000000', '2022-12-20T00:00:00.000000000',
       '2022-12-31T00:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 280B 2022-01-20 2022-01-31 ... 2022-12-31
    crs      int64 8B 0
Attributes:
    axis:       T
    long_name:  Time

The temporal range of lai_ds is longer then the fused ALOS-2 L-band and Sentinel-1 C-band datacube. To fit lai_ds object to the SAR datacube, we will need to cut the lai_ds temporal extent using the selection method (sel). Complete the following code snippet to perform the previous described selection operation.

lai_ds = lai_ds.sel(time=slice(..., ...))  # YOUR CODE HERE
lai_ds.time

D.3 Question 3

Now that the temporal range of the LAI datacube matches that of fused_ds, we can continue by aligning the spatial coordinates, so that we can create a datacube containing both variables (LAI and SAR data). Yet again, apply the same methods as shown in the in-class exercise.

Let’s break this down into steps first align both datacubes.

  1. Align both datacubes. Remember to use the rioxarray package and use the default resampling method.
lai_ds = ...  # YOUR CODE HERE
lai_ds
  1. Write the coordinates of fused_ds to the reprojected lai_ds object to prevent mistakes caused by floating point errors.
lai_ds = lai_ds.assign_coords({"x": ..., "y": ...})  # YOUR CODE HERE
lai_ds

If the previous operations were successfull, 3) we can merge the two variables: SAR and LAI. We use a different xarray function for this, where we combine the two variable to a xarray.DataSet with the merge function.

# YOUR CODE HERE ----------------------------------------------------------
fused_ds = xr.merge([..., ...])  # combine two variables in an Xarray.Dataset
# YOUR CODE HERE ----------------------------------------------------------
fused_ds

The last step is the 4) resample operation to align the timestamps. Use again a median value.

fused_ds = ...  # YOUR CODE HERE

Plot the LAI variable with the following lines of code to check your results:

fused_ds.LAI.\
    dropna(dim="time", how="all").\
    hvplot.image(robust=True, data_aspect=1, cmap="viridis", rasterize=True).\
    opts(frame_height=400, aspect="equal")
to_store = fused_ds.copy()
for var in to_store.variables:
    to_store[var].encoding.clear()
to_store.to_zarr("fused_ds.zarr", mode="w")