import numpy as np
import pystac_client
import odc.stac
import matplotlib.pyplot as plt
import xarray as xr
import rioxarray # noqa
In this notebook, we are going to have a look at the conversion of units. Sentinel-1 data, and most other SAR data, is usually provided in decibels (dB). In this notebook, we will discover the advantages of displaying SAR data in decibels and why we need to convert the data to a linear scale in order to make meaningful calculations. Let’s start with importing some libraries.
\[ \text{logarithmic} \longleftrightarrow \text{linear} \] \[ [\text{dB}] \longleftrightarrow [\text{m}^2 \cdot \text{m}^{-2}] \]
2.1 Exploring the Data
Let’s start by loading some sample data, in order to demonstrate why this conversion is important. Here we will have a look at some SAR data from the Sentinel-1. The data is provided in decibels (dB). In the following example, we will:
- load data from Sentinel-1
- visualize the data in logarithmic scale
- compare the data with linear scale
2.2 Search for some Data
Now, we start by loading data from Sentinel-1 from the EODC
STAC Catalogue. We do this in the same way as in the previous notebook 1.
= 48, 48.5
latmin, latmax = 16, 17
lonmin, lonmax = (lonmin, latmin, lonmax, latmax)
bounds
= "2022-01-01/2022-12-30"
time_range
= (
items open("https://stac.eodc.eu/api/v1")
pystac_client.Client.
.search(=bounds,
bbox=["SENTINEL1_SIG0_20M"],
collections=time_range,
datetime=100,
limit
)
.item_collection()
)
print(len(items), "scenes found")
708 scenes found
= "VV"
bands = "EPSG:27704" # Coordinate Reference System: EQUI7 Grid of Europe
crs = 20 # 20 meter
res
= odc.stac.stac_load(
sig0_dc
items,=bands,
bands=bounds,
bbox={"time": 5, "x": 600, "y": 600},
chunks
)
= items[0].assets["VV"].extra_fields["raster:bands"][0]["nodata"]
nodata = items[0].assets["VV"].extra_fields["raster:bands"][0]["scale"]
scale
= (sig0_dc.where(sig0_dc != nodata) / scale).VV
sig0_dc sig0_dc
<xarray.DataArray 'VV' (time: 708, y: 3150, x: 3978)> Size: 35GB dask.array<truediv, shape=(708, 3150, 3978), dtype=float32, chunksize=(5, 600, 600), chunktype=numpy.ndarray> Coordinates: * y (y) float64 25kB 1.653e+06 1.653e+06 ... 1.59e+06 1.59e+06 * x (x) float64 32kB 5.241e+06 5.241e+06 ... 5.32e+06 5.32e+06 spatial_ref int32 4B 27704 * time (time) datetime64[ns] 6kB 2022-01-01T16:42:42 ... 2022-12-29...
2.3 Comparison of the Data in dB and Linear Scale
In the next two cells we will select a subset of the data. This is done to reduce the amount of data we are working with. The precice workflow is not important for now, since the theory is explained after the cells. They are just here to show the data we are working with.
= sig0_dc.sel(time=slice("2022-07-01", "2022-07-07"))
subset = subset.dropna("time", how="all") subset
Now plot the data.
= subset.isel(time=0, x=slice(0, 500), y=slice(0, 500))
aoi = 10 ** (aoi / 10)
aoi_lin
= plt.subplots(2, 3, figsize=(14, 8))
fig, ax # upper left
= ax[0, 0]
ax_ul =True, ax=ax_ul, cmap="Greys_r")
aoi.plot.imshow(robustr"$\sigma^0$ [$dB$] (robust plot)")
ax_ul.set_title("equal")
ax_ul.axes.set_aspect(
# upper middle
= ax[0, 1]
ax_um =False, ax=ax_um, cmap="Greys_r")
aoi.plot.imshow(robustr"$\sigma^0$ [$dB$] (not robust plot)")
ax_um.set_title("equal")
ax_um.axes.set_aspect(
# upper right
= ax[0, 2]
ax_ur =50, ax=ax_ur, edgecolor="black")
aoi.plot.hist(binsr"$\sigma^0$ [$dB$]")
ax_ur.set_xlabel(r"$\sigma^0$ [$dB$] distribution")
ax_ur.set_title("n (number of pixels)")
ax_ur.set_ylabel(
# lower left
= ax[1, 0]
ax_ll =True, ax=ax_ll, cmap="Greys_r")
aoi_lin.plot.imshow(robustr"$\sigma^0$ [$m^2 \cdot m^{-2}$] (robust plot)")
ax_ll.set_title("equal")
ax_ll.axes.set_aspect(
# lower middle
= ax[1, 1]
ax_lm =False, ax=ax_lm, cmap="Greys_r")
aoi_lin.plot.imshow(robustr"$\sigma^0$ [$m^2 \cdot m^{-2}$] (not robust plot)")
ax_lm.set_title("equal")
ax_lm.axes.set_aspect(
# lower right
= ax[1, 2]
ax_lr =50, ax=ax_lr, edgecolor="black")
aoi_lin.plot.hist(binsr"$\sigma^0$ [$m^2 \cdot m^{-2}$]")
ax_lr.set_xlabel("n (number of pixels)")
ax_lr.set_ylabel(r"$\sigma^0$ [$m^2 \cdot m^{-2}$] distribution")
ax_lr.set_title(
= (
title r"Sentinel-1 backscatter $\sigma^0$ comparison"
+ r" in linear and logarithmic domain"
)="center")
fig.suptitle(title, horizontalalignment plt.tight_layout()
Figure 1: Visually comparing \(\sigma^0\) on a logarithmic and linear scale (left column). In addition, the benefit of using the robust plotting method is shown (middle column). The robust argument uses the 2nd and 98th percentiles of the data to compute the color limits to eliminate washing out the plot due to data outliers.
In the plot above you can see the difference between the two scales. The values in dB are more evenly distributed and are therefore easier to plot. The values in linear scale are more spread out and are therefore harder to interpret. This is why we use the dB scale for plotting/visualization.
While the logarithmic scale facilitates visual interpretation, it has implications for mathematical operations. In the following, we’ll have a closer look at this. But first, let’s see how we can convert between the linear and the logarithmic domains.
2.4 Conversion Formulas
The decibel (dB) is a logarithmic unit used to express the ratio of two values of a physical quantity, often power or intensity. In the case of SAR data, the backscatter coefficient is often expressed in dB to facilitate visualization.
In order to convert the data from dB to linear scale, we use the following formula. Let \(D\) be the original value (dB) and \(I\) the converted value (\(m^2m^{-2}\)). The conversion of units can be expressed as: \[
D = 10 \cdot \log_{10} (I) = 10 \cdot \log_{10} (e) \cdot \ln (I)\longrightarrow [dB]
\] Similarly, the conversion back to the original unit can be expressed as: \[
I = e^{\frac{D}{10\cdot \log_{10}(e)}} = 10^{\frac{D}{10}} \longrightarrow [m^2m^{-2}]
\] You can find these formulas in the script for Microwave Remote Sensing
on page 136 (equation 6.40)
.
Now let’s implement the conversion in Python.
def lin2db(val: float | int) -> float:
"""
Converts value from linear to dB units.
:param val: Value in linear units.
:type val: float|int
:return: Value in dB.
:rtype: float
"""
return 10 * np.log10(val)
def db2lin(val: float | int) -> float:
"""
Converts value from dB to linear units.
:param val: Value in dB.
:type val: float|int
:return: Value in linear units.
:rtype: float
"""
return 10 ** (val / 10)
When performing mathematical operations with SAR data it is important to be aware, that adding values in the logarithmic scale doesn’t work in the same way as adding regular (linear) values. This is because in the logarithmic scale, each unit step represents an equal multiplication. This means that an addition of two values in the logarithmic scale equals a multiplication of the values in the linear scale. Vice versa, a subtraction in a logarithmic scale equals a division in a linear scale. Let’s have a look at an example, where we add two values, once without the conversion to linear scale and once with the conversion to linear scale.
# Logarithmic addition
# Values in linear and decibel units
= 10, 12
val1_db, val2_db
# Logarithmic addition
= val1_db + val2_db
sum_db print("Logarithmic Addition:")
print(f"Logarithmic values: \t{val1_db: <5}, {val2_db: <5} [dB]")
print(f"Logarithmic sum: \t{val1_db} + {val2_db} = {sum_db: <5} [dB]")
# Linear addition
= db2lin(val1_db), db2lin(val2_db)
val1_lin, val2_lin = val1_lin + val2_lin
sum_lin print("\nLinear Addition:")
print(f"""Linear values: \t\t{val1_lin: <5}, {val2_lin: <5.2f} [lin]
\t\t\t(converted from dB)""")
print(f"Linear sum: \t\t{val1_lin} + {val2_lin: .2f} = {sum_lin: .2f} [lin]")
print(f"\t\t\t= {lin2db(sum_lin): .2f} [dB]")
Logarithmic Addition:
Logarithmic values: 10 , 12 [dB]
Logarithmic sum: 10 + 12 = 22 [dB]
Linear Addition:
Linear values: 10.0 , 15.85 [lin]
(converted from dB)
Linear sum: 10.0 + 15.85 = 25.85 [lin]
= 14.12 [dB]
As you can see, the values in dB and in linear scale differ quite a bit. In the example above, the values differ by a factor of around 6 when looked at in linear scale.
Now that we have some data, we will have a look at some practical examples where we will convert the data to linear scale. When we try to calculate the average \(\sigma^0\) value across the scene, we need to do this by converting the data to linear scale first and then calculating the average and converting it back to dB.
2.4.1 Creating a Monthly Mosaic
So in the beginning we have lazily loaded data for an area across a whole year. We therefore have around 700 images. We will now essentially compress the data of each month into one timestamp. This is done by using the resampling
method together with an operation method like mean
that includes summation. Since the data is in dB we need to convert it to linear scale first, then we can resample the data and convert it back to dB.
# Convert to linear scale and calculate monthly means
# Conversion by calculating with the xarray Object
= 10 ** (sig0_dc / 10)
sig0_lin
# Resample to monthly means. Time accepts intervals identical to the pandas
# resample function. 'D' for days, 'W' for weeks, 'ME' for months.
= sig0_lin.resample(time="1ME").mean()
sig0_lin_monthly
# Convert back to dB scale
# Conversion by applying a function
= lin2db(sig0_lin_monthly)
sig0_monthly sig0_monthly
<xarray.DataArray 'VV' (time: 12, y: 3150, x: 3978)> Size: 601MB dask.array<mul, shape=(12, 3150, 3978), dtype=float32, chunksize=(1, 600, 600), chunktype=numpy.ndarray> Coordinates: * y (y) float64 25kB 1.653e+06 1.653e+06 ... 1.59e+06 1.59e+06 * x (x) float64 32kB 5.241e+06 5.241e+06 ... 5.32e+06 5.32e+06 spatial_ref int32 4B 27704 * time (time) datetime64[ns] 96B 2022-01-31 2022-02-28 ... 2022-12-31
The dataset has now only 12 timestamps, one for each month. Next, we want to calculate the average \(\sigma^0\) value across the scene for one month. We will do this again by converting the data to linear scale first and then calculating the average and converting it back to dB.
# Lets take a data array with db values
= sig0_monthly.sel(time="2022-07-30", method="nearest")
db_array
# Compute the linear values
= db2lin(db_array) lin_array
# Compute the average backscatter value in linear units across the whole scene
= lin_array.mean()
lin_mean print(f"Average backscatter value in linear units: {lin_mean.values: .3f}")
= lin2db(lin_mean)
db_from_lin_mean print(f"That value in dB: {db_from_lin_mean.values: .3f}\n")
# Compute the average backscatter value in dB across the whole scene
= db_array.mean()
db_mean print(f"Average backscatter value in dB: {db_mean.values: .3f}")
Average backscatter value in linear units: 0.124
That value in dB: -9.060
Average backscatter value in dB: -10.550
As you can see in the example, the mean values across the scene are different in dB and linear scale. Therefore, it is important to be aware in which scale the data is stored to perform the correct type of mathematical operation or always convert the data to linear scale before doing any calculations.
2.5 Save Mean Mosaic as Tif File
Often we want to store the output of a computation permanently on a file system. The most common file format for this is a GeoTIFF (TIF file with additional information on the georeference). The following cell indicates how this can be easily done with Xarray. When we want to store the data as a GeoTIFF we need to make sure to provide a spatial reference to geolocate the data. The best way to check whether the Xarray has a coordinate reference system (CRS) is by using the rioxarray rio.crs
accessor. More about this in notebook 4.
# Select some data which we want to save
= sig0_monthly.sel(time="2022-07-30", method="nearest")
data_2_save data_2_save.rio.crs
CRS.from_wkt('PROJCS["Azimuthal_Equidistant",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Azimuthal_Equidistant"],PARAMETER["latitude_of_center",53],PARAMETER["longitude_of_center",24],PARAMETER["false_easting",5837287.81977],PARAMETER["false_northing",2121415.69617],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]')
In this case, the spatial reference is the EPSG Code EPSG:27704
, which is the Equi7Grid Europe
. As the output dataarray already has a spatial reference we now save it as a raster file
# Save the data
data_2_save.rio.to_raster("sig0_mean_mosaic_july.tif", tiled=True, driver="GTiff", compress="LZW"
)
# Load the data again (for demonstration purposes)
= xr.open_dataset("sig0_mean_mosaic_july.tif", engine="rasterio")
loaded_data loaded_data
<xarray.Dataset> Size: 50MB Dimensions: (band: 1, x: 3978, y: 3150) Coordinates: * band (band) int64 8B 1 * x (x) float64 32kB 5.241e+06 5.241e+06 ... 5.32e+06 5.32e+06 * y (y) float64 25kB 1.653e+06 1.653e+06 ... 1.59e+06 1.59e+06 spatial_ref int64 8B ... Data variables: band_data (band, y, x) float32 50MB ...