Calculating tide statistics and satellite biases¶
This guide demonstrates how to use the tide_stats
and pixel_stats
functions from eo_tides.stats
to calculate local tide statistics and identify biases caused by interactions between tidal processes and satellite orbits.
Complex tide aliasing interactions between temporal tide dynamics and the regular overpass timing of sun-synchronous satellite sensors mean that satellites often do not always observe the entire tidal cycle. Biases in satellite coverage of the tidal cycle can mean that tidal extremes (e.g. the lowest or highest tides at a location) or particular tidal processes may either never be captured by satellites, or be over-represented in the satellite record. Local tide dynamics can cause these biases to vary greatly both through time and space, making it challenging to compare coastal processes consistently - particularly for large-scale coastal EO analyses.
To ensure that coastal EO analyses are not inadvertently affected by tide biases, it is important to compare how well the tides observed by satellites match the full range of tides at a location.
The tidal_stats
and pixel_stats
functions compares the subset of tides observed by satellite data against the full range of tides modelled at a regular interval through time (every two hours by default) across the entire time period covered by the satellite dataset.
This comparison is used to calculate several useful statistics and plots that summarise how well your satellite data capture real-world tidal conditions.
Figure 1: Example of tide biases in Landsat satellite data across coastal Australia (Bishop-Taylor et al. 2018). "Spread" represents the proportion of the astronomical tide range observed by satellites; low and high tide "offsets" represent the proportion of highest and lowest tides never observed.
Tip
For a more detailed discussion of the issue of tidal bias in sun-synchronous satellite observations of the coastline, refer to the 'Limitations and future work' section in Bishop-Taylor et al. 2018.
Getting started¶
As in the previous examples, our first step is to tell eo-tides
the location of our tide model or clipped tide model directory (if you haven't set this up, refer to the setup instructions here):
directory = "../../tests/data/tide_models/"
Load Sentinel-2 satellite data using odc-stac¶
We can now load a time-series of satellite data over our area of interest using the Open Data Cube's odc-stac
package.
In this example, we will load Sentinel-2 satellite data from 2021-2023 over the city of Broome, Western Australia.
We will load this data from the Microsoft Planetary Computer STAC catalogue.
Tip
For a more detailed guide to using STAC metadata and odc-stac
to find and load satellite data, refer to the STAC user guide here.
import odc.stac
import pystac_client
import planetary_computer
# Connect to STAC catalog
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
# Set cloud access defaults
odc.stac.configure_rio(
cloud_defaults=True,
aws={"aws_unsigned": True},
)
# Build a query and search the STAC catalog for all matching items
bbox = [122.160, -18.05, 122.260, -17.95]
query = catalog.search(
bbox=bbox,
collections=["sentinel-2-l2a"],
datetime="2021/2023",
)
# Load data into xarray format
ds_s2 = odc.stac.load(
items=list(query.items()),
bands=["red"],
crs="utm",
resolution=30,
groupby="solar_day",
bbox=bbox,
fail_on_error=False,
chunks={},
)
print(ds_s2)
<xarray.Dataset> Size: 111MB Dimensions: (y: 371, x: 356, time: 211) Coordinates: * y (y) float64 3kB 8.015e+06 8.015e+06 ... 8.004e+06 8.004e+06 * x (x) float64 3kB 4.11e+05 4.111e+05 ... 4.217e+05 4.217e+05 spatial_ref int32 4B 32751 * time (time) datetime64[ns] 2kB 2021-01-03T02:04:51.024000 ... 202... Data variables: red (time, y, x) float32 111MB dask.array<chunksize=(1, 371, 356), meta=np.ndarray>
Using tide_stats¶
Once we have loaded some satellite data, we can pass this to the tide_stats
function to calculate local tide statistics and reveal any potential tidal biases.
The tide_stats
function will return a plain-text summary below, as well as a visual plot that compares the distribution of satellite-observed tides (black dots) against the full range of modelled astronomical tide conditions (blue) using three useful metrics:
- Spread: The proportion of the full modelled astronomical tidal range that was observed by satellites. A high value indicating good coverage of the tide range.
- High tide offset: The proportion of the highest tides not observed by satellites at any time, as a proportion of the full modelled astronomical tidal range. A high value indicates that the satellite data is biased towards never capturing high tides.
- Low tide offset: The proportion of the lowest tides not observed by satellites at any time, as a proportion of the full modelled astronomical tidal range. A high value indicates that the satellite data is biased towards never capturing low tides.
Tip
For a more detailed description of these biases, see Bishop-Taylor et al. 2018.
from eo_tides.stats import tide_stats
statistics_df = tide_stats(
data=ds_s2,
directory=directory,
)
Using tide modelling location: 122.21, -18.00 Modelling tides with EOT20 Using tide modelling location: 122.21, -18.00 Modelling tides with EOT20 🌊 Modelled astronomical tide range: 9.30 m (-4.60 to 4.70 m). 🛰️ Observed tide range: 6.29 m (-2.36 to 3.93 m). 🔴 68% of the modelled astronomical tide range was observed at this location. 🟢 The highest 8% (0.77 m) of the tide range was never observed. 🔴 The lowest 24% (2.25 m) of the tide range was never observed. 🌊 Mean modelled astronomical tide height: -0.00 m. 🛰️ Mean observed tide height: 0.69 m. ⬆️ The mean observed tide height was 0.69 m higher than the mean modelled astronomical tide height.
As we can see in the graph, Sentinel-2 captured a biased proportion of the tide range at this location: only observing ~68% of the tide range, and never observing the lowest 24% of tides.
The tide_stats
function also outputs a pandas.Series
object containing statistics for the results above, including:
mot
: mean tide height observed by the satellite (metres)mat
: mean modelled astronomical tide height (metres)hot
: maximum tide height observed by the satellite (metres)hat
: maximum tide height from modelled astronomical tidal range (metres)lot
: minimum tide height observed by the satellite (metres)lat
: minimum tide height from modelled astronomical tidal range (metres)otr
: tidal range observed by the satellite (metres)tr
: modelled astronomical tide range (metres)spread
: proportion of the full modelled tidal range observed by the satelliteoffset_low
: proportion of the lowest tides never observed by the satelliteoffset_high
: proportion of the highest tides never observed by the satellitey
: latitude used for modelling tide heightsx
: longitude used for modelling tide heights
statistics_df
mot 0.691000 mat -0.000000 hot 3.930000 hat 4.696000 lot -2.355000 lat -4.604000 otr 6.285000 tr 9.300000 spread 0.676000 offset_low 0.242000 offset_high 0.082000 x 122.209999 y -18.000000 dtype: float32
Compare against Sentinel-1 tide biases¶
In the previous example, we saw that Sentinel-2 data was biased towards never capturing low tide at our location. These biases are caused by the consistent 10:30 am local overpass time of the Sentinel-2 satellites, which due to a phenomenon called "tide-aliasing" means that certain tides never occur when the satellite overpasses.
One possible way around these biases is to use different satellite data from satellites that overpass at different times. For example, Sentinel-1 radar satellites follow a different orbit to Sentinel-2, overpassing at a local time of 6:00 pm instead of 10:30 am. This diference in overpass time potentially means that Sentinel-1 satellite data may capture different tides to Sentinel-2.
In the next example, we run the tide_stats
function on data loaded from Sentinel-1 for the same location and time period to see if this data is affected by the same tide biases.
# Build a query and search the STAC catalog for all matching items
bbox = [122.160, -18.05, 122.260, -17.95]
query = catalog.search(
bbox=bbox,
collections=["sentinel-1-rtc"],
datetime="2021/2023",
)
# Load data into xarray format
ds_s1 = odc.stac.load(
items=list(query.items()),
bands=["vv"],
crs="utm",
resolution=30,
groupby="solar_day",
bbox=bbox,
fail_on_error=False,
chunks={},
)
print(ds_s1)
<xarray.Dataset> Size: 47MB Dimensions: (y: 371, x: 356, time: 89) Coordinates: * y (y) float64 3kB 8.015e+06 8.015e+06 ... 8.004e+06 8.004e+06 * x (x) float64 3kB 4.11e+05 4.111e+05 ... 4.217e+05 4.217e+05 spatial_ref int32 4B 32751 * time (time) datetime64[ns] 712B 2021-01-07T21:23:20.996123 ... 20... Data variables: vv (time, y, x) float32 47MB dask.array<chunksize=(1, 371, 356), meta=np.ndarray>
When we run tide_stats
, we can see a very different pattern: Sentinel-1 data is biased towards low tide observations, and never observes high tide at our location!
statistics_df = tide_stats(
data=ds_s1,
directory=directory,
)
Using tide modelling location: 122.21, -18.00 Modelling tides with EOT20 Using tide modelling location: 122.21, -18.00 Modelling tides with EOT20 🌊 Modelled astronomical tide range: 9.55 m (-4.81 to 4.74 m). 🛰️ Observed tide range: 6.40 m (-4.39 to 2.02 m). 🔴 67% of the modelled astronomical tide range was observed at this location. 🔴 The highest 28% (2.72 m) of the tide range was never observed. 🟢 The lowest 4% (0.43 m) of the tide range was never observed. 🌊 Mean modelled astronomical tide height: -0.00 m. 🛰️ Mean observed tide height: -1.31 m. ⬇️ The mean observed tide height was -1.31 m lower than the mean modelled astronomical tide height.
Calculate biases for multiple satellite sensors¶
At our location, Sentinel-2 optical satellites are biased towards high tide observations, while Sentinel-1 radar satellites are biased towards low tide observations. Could combining data from multiple EO sensors help us capture a more complete view of tides at this location?
To test this theory, we can combine Sentinel-2 and Sentinel-1 data into a single xarray.Dataset
, recording the name of each sensor using a new satellite_name
coordinate in our data:
import xarray as xr
# Give each observation a "satellite_name" based on its satellite
ds_s1 = ds_s1.assign_coords(satellite_name=("time", ["Sentinel-1"] * ds_s1.sizes["time"]))
ds_s2 = ds_s2.assign_coords(satellite_name=("time", ["Sentinel-2"] * ds_s2.sizes["time"]))
# Combine both Sentinel-1 and Sentinel-2 data into a single dataset
ds_all = xr.concat([ds_s1, ds_s2], dim="time")
print(ds_all)
<xarray.Dataset> Size: 317MB Dimensions: (time: 300, y: 371, x: 356) Coordinates: * y (y) float64 3kB 8.015e+06 8.015e+06 ... 8.004e+06 8.004e+06 * x (x) float64 3kB 4.11e+05 4.111e+05 ... 4.217e+05 4.217e+05 spatial_ref int32 4B 32751 * time (time) datetime64[ns] 2kB 2021-01-07T21:23:20.996123 ... ... satellite_name (time) <U10 12kB 'Sentinel-1' 'Sentinel-1' ... 'Sentinel-2' Data variables: vv (time, y, x) float32 158MB dask.array<chunksize=(1, 371, 356), meta=np.ndarray> red (time, y, x) float32 158MB dask.array<chunksize=(1, 371, 356), meta=np.ndarray>
We can now run tide_stats
again.
This time, we pass our satellite name coordinate to the function using the plot_var="satellite_name"
parameter.
This will plot data from each of our satellites using a different symbol and colour.
statistics_df = tide_stats(
data=ds_all,
plot_var="satellite_name",
directory=directory,
)
Using tide modelling location: 122.21, -18.00 Modelling tides with EOT20 Using tide modelling location: 122.21, -18.00 Modelling tides with EOT20 🌊 Modelled astronomical tide range: 9.30 m (-4.60 to 4.70 m). 🛰️ Observed tide range: 8.32 m (-4.39 to 3.93 m). 🟡 89% of the modelled astronomical tide range was observed at this location. 🟢 The highest 8% (0.77 m) of the tide range was never observed. 🟢 The lowest 2% (0.22 m) of the tide range was never observed. 🌊 Mean modelled astronomical tide height: -0.00 m. 🛰️ Mean observed tide height: 0.10 m. ⬆️ The mean observed tide height was 0.10 m higher than the mean modelled astronomical tide height.
We can see that at this location, combining Sentinel-2 and Sentinel-1 data greatly improves our biases: our satellite data now covers ~85% of the modelled astronomical tide range, and only fails to observe 10% of the highest tides and 6% of the lowest tides!
Using pixel_stats¶
Modelling tide statistics and biases spatially¶
Because tide regimes and satellite biases can vary greatly along the coast, it can be useful to plot these biases spatially.
To do this, we can use the pixel_stats
function.
pixel_stats
works similarly to tide_stats
, except statistics are calculated across the entire extent of your satellite dataset.
The function will generate an xarray.Dataset
output containing the statistics discussed above as two-dimensional arrays.
Tip
The pixel_stats
function uses eo_tides.eo.pixel_tides
to model tides spatially. You can experiment passing in parameters like resolution
and buffer
to customise the modelling grid used for calculating tide biases. Be warned however that you can quickly run out of memory with large analyses, given the number of timesteps required to model astronomical low and high tide.
from eo_tides.stats import pixel_stats
stats_ds = pixel_stats(
data=ds_s2,
directory=directory,
)
print(stats_ds)
Creating reduced resolution 5000 x 5000 metre tide modelling array Modelling tides with EOT20 Returning low resolution tide array Creating reduced resolution 5000 x 5000 metre tide modelling array Modelling tides with EOT20 Returning low resolution tide array
/workspaces/eo-tides/.venv/lib/python3.12/site-packages/numpy/lib/_nanfunctions_impl.py:1634: RuntimeWarning: All-NaN slice encountered return fnb._ureduce(a,
Reprojecting statistics into original resolution
/workspaces/eo-tides/.venv/lib/python3.12/site-packages/rasterio/warp.py:387: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned. dest = _reproject(
<xarray.Dataset> Size: 6MB Dimensions: (y: 371, x: 356) Coordinates: tide_model <U5 20B 'EOT20' * y (y) float64 3kB 8.015e+06 8.015e+06 ... 8.004e+06 8.004e+06 * x (x) float64 3kB 4.11e+05 4.111e+05 ... 4.217e+05 4.217e+05 spatial_ref int32 4B 32751 Data variables: mot (y, x) float32 528kB 0.7088 0.7088 0.7088 ... 0.6832 0.6832 mat (y, x) float32 528kB -0.0001353 -0.0001353 ... -0.0001289 hot (y, x) float32 528kB 3.874 3.874 3.874 ... 3.948 3.948 3.948 hat (y, x) float32 528kB 4.494 4.494 4.494 ... 4.773 4.773 4.773 lot (y, x) float32 528kB -2.236 -2.236 -2.236 ... -2.402 -2.402 lat (y, x) float32 528kB -4.425 -4.425 -4.425 ... -4.672 -4.673 otr (y, x) float32 528kB 6.109 6.109 6.109 6.109 ... 6.35 6.35 6.35 tr (y, x) float32 528kB 8.919 8.919 8.919 ... 9.445 9.445 9.446 spread (y, x) float32 528kB 0.685 0.685 0.685 ... 0.6723 0.6723 0.6723 offset_low (y, x) float32 528kB 0.2455 0.2455 0.2455 ... 0.2403 0.2403 offset_high (y, x) float32 528kB 0.0695 0.0695 0.0695 ... 0.08738 0.0874
We can explore these statistics on a map:
stats_ds.spread.odc.explore()
Or plot them directly:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 3, figsize=(16, 4))
stats_ds.spread.plot(ax=axes[0], vmin=0.5, vmax=0.9, cmap="magma_r", add_labels=False)
stats_ds.offset_low.plot(ax=axes[1], vmin=0, vmax=0.35, cmap="magma", add_labels=False)
stats_ds.offset_high.plot(ax=axes[2], vmin=0, vmax=0.35, cmap="magma", add_labels=False)
axes[0].set_title("Spread")
axes[1].set_title("Low tide offset")
axes[2].set_title("High tide offset");
Next steps¶
We have explored calculating tide statistics and biases in EO data. Now we can learn how to validate modelled tides against measured tide gauge data to ensure the tides we are modelling are accurate.