Calculating tide statistics and satellite biases¶

This guide demonstrates how to use the tide_stats, pixel_stats and tide_aliasing functions from eo_tides.stats to calculate local tide statistics and identify biases caused by interactions between tidal processes and satellite orbits.

Complex tide aliasing interactions between temporal tide dynamics and the regular overpass timing of sun-synchronous satellite sensors mean that satellites often do not always observe the entire tidal cycle. Biases in satellite coverage of the tidal cycle can mean that tidal extremes (e.g. the lowest or highest tides at a location) or particular tidal processes may either never be captured by satellites, or be over-represented in the satellite record. Local tide dynamics can cause these biases to vary greatly both through time and space, making it challenging to compare coastal processes consistently - particularly for large-scale coastal EO analyses.

To ensure that coastal EO analyses are not inadvertently affected by tide biases, it is important to compare how well the tides observed by satellites match the full range of tides at a location. The tidal_stats and pixel_stats functions compares the subset of tides observed by satellite data against the full range of tides modelled at a regular interval through time (every two hours by default) across the entire time period covered by the satellite dataset. This comparison is used to calculate several useful statistics and plots that summarise how well your satellite data capture real-world tidal conditions. The tide_aliasing function can be used to identify potential temporal tide biases, and ensure EO analyses include enough satellite data to adequately sample the entire tide cycle.

Figure 1: Example of tide biases in Landsat satellite data across coastal Australia (Bishop-Taylor et al. 2018). "Spread" represents the proportion of the astronomical tide range observed by satellites; low and high tide "offsets" represent the proportion of highest and lowest tides never observed.

Tip

For a more detailed discussion of the issue of tidal bias in sun-synchronous satellite observations of the coastline, refer to the 'Limitations and future work' section in Bishop-Taylor et al. 2018.

Getting started¶

As in the previous examples, our first step is to tell eo-tides the location of our tide model directory (if you haven't set this up, refer to the setup instructions here).

We can also define the tide model we want to use the for analysis; for example, the default model "EOT20".

In [1]:

Copied!

directory = "../../tests/data/tide_models/"
model = "EOT20"
directory = "../../tests/data/tide_models/"
model = "EOT20"

Load Sentinel-2 satellite data using odc-stac¶

We can now load a time-series of satellite data over our area of interest using the Open Data Cube's odc-stac package. In this example, we will load Sentinel-2 satellite data from 2021-2023 over the city of Broome, Western Australia. We will load this data from the Microsoft Planetary Computer STAC catalogue.

Tip

For a more detailed guide to using STAC metadata and odc-stac to find and load satellite data, refer to the odc-stac Python package documentation. eo-tides is compatible with satellite data loaded from any STAC API, for example Digital Earth Australia.

In [2]:

Copied!





import odc.stac
import planetary_computer
import pystac_client

# Connect to STAC catalog
catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

# Set cloud access defaults
odc.stac.configure_rio(
    cloud_defaults=True,
    aws={"aws_unsigned": True},
)

# Build a query and search the STAC catalog for all matching items
bbox = [122.160, -18.05, 122.260, -17.95]
query = catalog.search(
    bbox=bbox,
    collections=["sentinel-2-l2a"],
    datetime="2021/2023",
    query={"eo:cloud_cover": {"lt": 10}},  # Filter to images with <10% cloud
)

# Load data into xarray format
ds_s2 = odc.stac.load(
    items=list(query.items()),
    bands=["red"],
    crs="utm",
    resolution=100,
    groupby="solar_day",
    bbox=bbox,
    fail_on_error=False,
    chunks={},
)

# Inspect output
ds_s2
import odc.stac
import planetary_computer
import pystac_client

# Connect to STAC catalog
catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

# Set cloud access defaults
odc.stac.configure_rio(
    cloud_defaults=True,
    aws={"aws_unsigned": True},
)

# Build a query and search the STAC catalog for all matching items
bbox = [122.160, -18.05, 122.260, -17.95]
query = catalog.search(
    bbox=bbox,
    collections=["sentinel-2-l2a"],
    datetime="2021/2023",
    query={"eo:cloud_cover": {"lt": 10}},  # Filter to images with <10% cloud
)

# Load data into xarray format
ds_s2 = odc.stac.load(
    items=list(query.items()),
    bands=["red"],
    crs="utm",
    resolution=100,
    groupby="solar_day",
    bbox=bbox,
    fail_on_error=False,
    chunks={},
)

# Inspect output
ds_s2

Out[2]:

<xarray.Dataset> Size: 6MB
Dimensions:      (y: 112, x: 107, time: 117)
Coordinates:
  * y            (y) float64 896B 8.015e+06 8.015e+06 ... 8.004e+06 8.004e+06
  * x            (x) float64 856B 4.11e+05 4.112e+05 ... 4.216e+05 4.216e+05
    spatial_ref  int32 4B 32751
  * time         (time) datetime64[ns] 936B 2021-01-03T02:04:51.024000 ... 20...
Data variables:
    red          (time, y, x) float32 6MB dask.array<chunksize=(1, 112, 107), meta=np.ndarray>

Using tide_stats¶

Once we have loaded some satellite data, we can pass this to the tide_stats function to calculate local tide statistics and reveal any potential tidal biases. The tide_stats function will return a plain-text summary below, as well as a visual plot that compares the distribution of satellite-observed tides (black dots) against the full range of modelled astronomical tide conditions (blue) using three useful metrics:

Spread: The proportion of the full modelled astronomical tidal range that was observed by satellites. A high value indicating good coverage of the tide range.
High tide offset: The proportion of the highest tides not observed by satellites at any time, as a proportion of the full modelled astronomical tidal range. A high value indicates that the satellite data is biased towards never capturing high tides.
Low tide offset: The proportion of the lowest tides not observed by satellites at any time, as a proportion of the full modelled astronomical tidal range. A high value indicates that the satellite data is biased towards never capturing low tides.

Tip

For a more detailed description of these biases, see Bishop-Taylor et al. 2018.

In [3]:

Copied!





from eo_tides.stats import tide_stats

statistics_df = tide_stats(
    data=ds_s2,
    model=model,
    directory=directory,
)
from eo_tides.stats import tide_stats

statistics_df = tide_stats(
    data=ds_s2,
    model=model,
    directory=directory,
)

Using tide modelling location: 122.21, -18.00
Modelling tides with EOT20
Using tide modelling location: 122.21, -18.00
Modelling tides with EOT20


🌊 Modelled astronomical tide range: 9.36 m (-4.64 to 4.73 m).
🛰️ Observed tide range: 6.31 m (-2.36 to 3.95 m).

🔴 67% of the modelled astronomical tide range was observed at this location.
🟢 The highest 8% (0.78 m) of the tide range was never observed.
🔴 The lowest 24% (2.27 m) of the tide range was never observed.

🌊 Mean modelled astronomical tide height: -0.00 m.
🛰️ Mean observed tide height: 0.95 m.
⬆️ The mean observed tide height was 0.95 m higher than the mean modelled astronomical tide height.

No description has been provided for this image

As we can see in the graph, Sentinel-2 captured a biased proportion of the tide range at this location: only observing ~68% of the tide range, and never observing the lowest 24% of tides.

The tide_stats function also outputs a pandas.Series object containing statistics for the results above, including:

mot: mean tide height observed by the satellite (metres)
mat: mean modelled astronomical tide height (metres)
hot: maximum tide height observed by the satellite (metres)
hat: maximum tide height from modelled astronomical tidal range (metres)
lot: minimum tide height observed by the satellite (metres)
lat: minimum tide height from modelled astronomical tidal range (metres)
otr: tidal range observed by the satellite (metres)
tr: modelled astronomical tide range (metres)
spread: proportion of the full modelled tidal range observed by the satellite
offset_low: proportion of the lowest tides never observed by the satellite
offset_high: proportion of the highest tides never observed by the satellite
y: latitude used for modelling tide heights
x: longitude used for modelling tide heights

In [4]:

Copied!

statistics_df
statistics_df

Out[4]:

mot              0.951000
mat             -0.000000
hot              3.950000
hat              4.728000
lot             -2.361000
lat             -4.636000
otr              6.311000
tr               9.364000
spread           0.674000
offset_low       0.243000
offset_high      0.083000
x              122.209999
y              -18.000000
dtype: float32

Compare against Sentinel-1 tide biases¶

In the previous example, we saw that Sentinel-2 data was biased towards never capturing low tide at our location. These biases are caused by the consistent 10:30 am local overpass time of the Sentinel-2 satellites, which due to a phenomenon called "tide-aliasing" means that certain tides never occur when the satellite overpasses.

One possible way around these biases is to use different satellite data from satellites that overpass at different times. For example, Sentinel-1 radar satellites follow a different orbit to Sentinel-2, overpassing at a local time of 6:00 pm instead of 10:30 am. This difference in overpass time potentially means that Sentinel-1 satellite data may capture different tides to Sentinel-2.

In the next example, we run the tide_stats function on data loaded from Sentinel-1 for the same location and time period to see if this data is affected by the same tide biases.

In [5]:

Copied!





# Build a query and search the STAC catalog for all matching items
bbox = [122.160, -18.05, 122.260, -17.95]
query = catalog.search(
    bbox=bbox,
    collections=["sentinel-1-rtc"],
    datetime="2021/2023",
)

# Load data into xarray format
ds_s1 = odc.stac.load(
    items=list(query.items()),
    bands=["vv"],
    crs="utm",
    resolution=100,
    groupby="solar_day",
    bbox=bbox,
    fail_on_error=False,
    chunks={},
)

# Inspect output
ds_s1
# Build a query and search the STAC catalog for all matching items
bbox = [122.160, -18.05, 122.260, -17.95]
query = catalog.search(
    bbox=bbox,
    collections=["sentinel-1-rtc"],
    datetime="2021/2023",
)

# Load data into xarray format
ds_s1 = odc.stac.load(
    items=list(query.items()),
    bands=["vv"],
    crs="utm",
    resolution=100,
    groupby="solar_day",
    bbox=bbox,
    fail_on_error=False,
    chunks={},
)

# Inspect output
ds_s1

Out[5]:

<xarray.Dataset> Size: 4MB
Dimensions:      (y: 112, x: 107, time: 89)
Coordinates:
  * y            (y) float64 896B 8.015e+06 8.015e+06 ... 8.004e+06 8.004e+06
  * x            (x) float64 856B 4.11e+05 4.112e+05 ... 4.216e+05 4.216e+05
    spatial_ref  int32 4B 32751
  * time         (time) datetime64[ns] 712B 2021-01-07T21:23:20.996123 ... 20...
Data variables:
    vv           (time, y, x) float32 4MB dask.array<chunksize=(1, 112, 107), meta=np.ndarray>

When we run tide_stats, we can see a very different pattern: Sentinel-1 data is biased towards low tide observations, and never observes high tide at our location!

In [6]:

Copied!





statistics_df = tide_stats(
    data=ds_s1,
    model=model,
    directory=directory,
)
statistics_df = tide_stats(
    data=ds_s1,
    model=model,
    directory=directory,
)

Using tide modelling location: 122.21, -18.00
Modelling tides with EOT20
Using tide modelling location: 122.21, -18.00
Modelling tides with EOT20


🌊 Modelled astronomical tide range: 9.53 m (-4.81 to 4.73 m).
🛰️ Observed tide range: 6.40 m (-4.38 to 2.02 m).

🔴 67% of the modelled astronomical tide range was observed at this location.
🔴 The highest 28% (2.70 m) of the tide range was never observed.
🟢 The lowest 4% (0.42 m) of the tide range was never observed.

🌊 Mean modelled astronomical tide height: -0.00 m.
🛰️ Mean observed tide height: -1.31 m.
⬇️ The mean observed tide height was -1.31 m lower than the mean modelled astronomical tide height.

Calculate biases for multiple satellite sensors¶

At our location, Sentinel-2 optical satellites are biased towards high tide observations, while Sentinel-1 radar satellites are biased towards low tide observations. Could combining data from multiple EO sensors help us capture a more complete view of tides at this location?

To test this theory, we can combine Sentinel-2 and Sentinel-1 data into a single xarray.Dataset, recording the name of each sensor using a new satellite_name coordinate in our data:

In [7]:

Copied!





import xarray as xr

# Give each observation a "satellite_name" based on its satellite
ds_s1 = ds_s1.assign_coords(satellite_name=("time", ["Sentinel-1"] * ds_s1.sizes["time"]))
ds_s2 = ds_s2.assign_coords(satellite_name=("time", ["Sentinel-2"] * ds_s2.sizes["time"]))

# Combine both Sentinel-1 and Sentinel-2 data into a single dataset
ds_all = xr.concat([ds_s1, ds_s2], dim="time")
ds_all
import xarray as xr

# Give each observation a "satellite_name" based on its satellite
ds_s1 = ds_s1.assign_coords(satellite_name=("time", ["Sentinel-1"] * ds_s1.sizes["time"]))
ds_s2 = ds_s2.assign_coords(satellite_name=("time", ["Sentinel-2"] * ds_s2.sizes["time"]))

# Combine both Sentinel-1 and Sentinel-2 data into a single dataset
ds_all = xr.concat([ds_s1, ds_s2], dim="time")
ds_all

Out[7]:

<xarray.Dataset> Size: 20MB
Dimensions:         (time: 206, y: 112, x: 107)
Coordinates:
  * y               (y) float64 896B 8.015e+06 8.015e+06 ... 8.004e+06 8.004e+06
  * x               (x) float64 856B 4.11e+05 4.112e+05 ... 4.216e+05 4.216e+05
    spatial_ref     int32 4B 32751
  * time            (time) datetime64[ns] 2kB 2021-01-07T21:23:20.996123 ... ...
    satellite_name  (time) <U10 8kB 'Sentinel-1' 'Sentinel-1' ... 'Sentinel-2'
Data variables:
    vv              (time, y, x) float32 10MB dask.array<chunksize=(1, 112, 107), meta=np.ndarray>
    red             (time, y, x) float32 10MB dask.array<chunksize=(1, 112, 107), meta=np.ndarray>

We can now run tide_stats again. This time, we pass our satellite name coordinate to the function using the plot_var="satellite_name" parameter. This will plot data from each of our satellites using a different symbol and colour.

In [8]:

Copied!





statistics_df = tide_stats(
    data=ds_all,
    plot_var="satellite_name",
    model=model,
    directory=directory,
)
statistics_df = tide_stats(
    data=ds_all,
    plot_var="satellite_name",
    model=model,
    directory=directory,
)

Using tide modelling location: 122.21, -18.00
Modelling tides with EOT20
Using tide modelling location: 122.21, -18.00
Modelling tides with EOT20


🌊 Modelled astronomical tide range: 9.36 m (-4.64 to 4.73 m).
🛰️ Observed tide range: 8.33 m (-4.38 to 3.95 m).

🟡 89% of the modelled astronomical tide range was observed at this location.
🟢 The highest 8% (0.78 m) of the tide range was never observed.
🟢 The lowest 3% (0.26 m) of the tide range was never observed.

🌊 Mean modelled astronomical tide height: -0.00 m.
🛰️ Mean observed tide height: -0.02 m.
⬇️ The mean observed tide height was -0.02 m lower than the mean modelled astronomical tide height.

We can see that at this location, combining Sentinel-2 and Sentinel-1 data greatly improves our biases: our satellite data now covers ~89% of the modelled astronomical tide range, and only fails to observe 8% of the highest tides and 2% of the lowest tides!

Using pixel_stats¶

Modelling tide statistics and biases spatially¶

Because tide regimes and satellite biases can vary greatly along the coast, it can be useful to plot these biases spatially. To do this, we can use the pixel_stats function.

pixel_stats works similarly to tide_stats, except that statistics are calculated across the entire extent of your satellite dataset. The function will generate an xarray.Dataset output containing the statistics discussed above as two-dimensional arrays.

Tip

The pixel_stats function uses eo_tides.eo.pixel_tides to model tides spatially. You can experiment passing in parameters like resolution and buffer to customise the modelling grid used for calculating tide biases. Be warned however that you can quickly run out of memory with large analyses, given the number of timesteps required to model astronomical low and high tide.

We can plot these statistics to inspect any spatial tide bias patterns:

In [10]:

Copied!

import matplotlib.pyplot as plt

stats_ds.spread.plot.imshow()
plt.gca().set_title("Spread (%)");
import matplotlib.pyplot as plt

stats_ds.spread.plot.imshow()
plt.gca().set_title("Spread (%)");

In [11]:

Copied!

stats_ds.offset_low.plot.imshow()
plt.gca().set_title("Low tide offset (%)");
stats_ds.offset_low.plot.imshow()
plt.gca().set_title("Low tide offset (%)");

In [12]:

Copied!

stats_ds.offset_high.plot.imshow()
plt.gca().set_title("High tide offset (%)");
stats_ds.offset_high.plot.imshow()
plt.gca().set_title("High tide offset (%)");

Tide aliasing for satellite sensors¶

The biases above are caused by interactions between satellite overpasses and the natural cycles of tidal constituents called tide aliasing. Tide aliasing occurs because EO satellites typically capture imagery of the coastline at a frequency that does not align with the frequencies of the many harmonic tidal constituents that produce tides.

The aliasing period of a tidal constituent describes how long it would take for a satellite to sample the entire tidal cycle for each constituent, based on the satellite's observation frequency.

We can use the eo_tides.stats.tide_aliasing function to estimate aliasing periods for tide constituents, based on the overpass frequency of common EO satellites:

Short aliasing periods mean the satellite will observe the full range of tidal variation relatively quickly, reducing the risk of tide-related bias.
Long aliasing periods indicate that it will take much longer to sample all tidal conditions, increasing the risk that satellite analyses may misrepresent tidal dynamics.

Note

Revisit periods are approximate and based on nominal repeat cycles at the equator. Actual observation frequency may vary due to latitude, cloud cover, sensor availability, and acquisition planning.

In [13]:

Copied!

from eo_tides.stats import tide_aliasing

tide_aliasing(satellites=["landsat", "sentinel-2"], style=False)
from eo_tides.stats import tide_aliasing

tide_aliasing(satellites=["landsat", "sentinel-2"], style=False)

Using 8 day revisit for landsat
Using 5 day revisit for sentinel-2

Out[13]:

	name	type	period	aliasing_period
				landsat	sentinel-2
m2	principal lunar semidiurnal	semidiurnal	0.518	17.460	14.765
s2	principal solar semidiurnal	semidiurnal	0.500	inf	inf
k1	lunar-solar diurnal	diurnal	0.997	365.242	365.242
o1	principal lunar diurnal	diurnal	1.076	18.337	14.192
n2	larger lunar elliptic semidiurnal	semidiurnal	0.527	47.660	10.419
p1	principal solar diurnal	diurnal	1.003	365.242	365.242
k2	lunar-solar semidiurnal	semidiurnal	0.499	182.621	182.621
q1	larger lunar elliptic diurnal	diurnal	1.120	54.812	10.725
2n2	lunar elliptic semidiurnal second order	semidiurnal	0.538	65.318	16.753
mu2	variational constituent	semidiurnal	0.536	95.668	15.493
nu2	larger lunar evectional	semidiurnal	0.526	38.701	10.085
l2	smaller lunar elliptic semidiurnal	semidiurnal	0.508	31.812	31.812
t2	larger solar elliptic	semidiurnal	0.501	365.242	365.242
j1	smaller lunar elliptic diurnal	diurnal	0.962	25.622	25.622
m1	smaller lunar elliptic diurnal	diurnal	1.035	29.531	29.531
oo1	lunar diurnal	diurnal	0.929	20.383	13.168
rho1	larger lunar evectional diurnal	diurnal	1.113	43.288	10.194
mf	lunisolar fortnightly	long period	13.661	19.306	13.661
mm	lunar monthly	long period	27.555	27.555	27.555
ssa	solar semiannual	long period	182.627	182.621	182.621
m4	shallow water overtides of principal lunar	shallow water	0.259	95.668	15.493
ms4	shallow water quarter diurnal	shallow water	0.254	17.460	14.765
mn4	shallow water quarter diurnal	shallow water	0.261	21.393	35.391
m6	shallow water overtides of principal lunar	shallow water	0.173	21.358	314.549
m8	shallow water eighth diurnal	shallow water	0.129	47.834	14.103
mk3	shallow water terdiurnal	shallow water	0.341	16.663	15.387
s6	shallow water overtides of principal solar	shallow water	0.167	inf	inf
2sm2	shallow water semidiurnal	semidiurnal	0.484	17.460	14.765
2mk3	shallow water terdiurnal	shallow water	0.349	75.811	16.179
msf	lunisolar synodic fortnightly	long period	14.765	17.460	14.765
sa	solar annual	long period	365.259	365.242	365.242
mt	meteorological tides	long period	9.133	64.491	11.049
2q1	lunar elliptic diurnal second order	diurnal	1.167	55.409	17.559

In the examples above, we can see several interesting results:

Constituents like m2 and o1 can be observed effectively by EO satellites, requiring only 14-18 days of data to capture the full tide cycle.
Constituents like sa, k1, p1 require a full year of data to be observed effectively, meaning shorter EO analysis will capture a biased portion of the tide cycle.
Constituents like s2 have an infinite aliasing period. This means that no matter how long a satellite observes the coast, it will never capture the entire tidal cycle for this constituent.

The tide_aliasing function can be customised to return results in different time units (e.g. years), or specific tide constituents only:

In [14]:

Copied!





tide_aliasing(
    satellites=["landsat", "sentinel-2"],
    c=["m2", "s2", "k1", "o1", "p1", "sa"],
    units="years",
    style=False,
)
tide_aliasing(
    satellites=["landsat", "sentinel-2"],
    c=["m2", "s2", "k1", "o1", "p1", "sa"],
    units="years",
    style=False,
)

Using 8 day revisit for landsat
Using 5 day revisit for sentinel-2

Out[14]:

	name	type	period	aliasing_period
				landsat	sentinel-2
m2	principal lunar semidiurnal	semidiurnal	0.001	0.048	0.040
s2	principal solar semidiurnal	semidiurnal	0.001	inf	inf
k1	lunar-solar diurnal	diurnal	0.003	1.000	1.000
o1	principal lunar diurnal	diurnal	0.003	0.050	0.039
p1	principal solar diurnal	diurnal	0.003	1.000	1.000
sa	solar annual	long period	1.000	1.000	1.000

To run the function for satellites that are not supported by default, provide a dictionary to satellites with the overpass frequency in days:

In [15]:

Copied!





tide_aliasing(
    satellites={"custom-sat": 3},
    c=["m2", "s2", "k1", "o1", "p1", "sa"],
    style=False,
)
tide_aliasing(
    satellites={"custom-sat": 3},
    c=["m2", "s2", "k1", "o1", "p1", "sa"],
    style=False,
)

Using 3 day revisit for custom-sat

Out[15]:

	name	type	period	aliasing_period
				custom-sat
m2	principal lunar semidiurnal	semidiurnal	0.518	14.765
s2	principal solar semidiurnal	semidiurnal	0.500	inf
k1	lunar-solar diurnal	diurnal	0.997	365.242
o1	principal lunar diurnal	diurnal	1.076	14.192
p1	principal solar diurnal	diurnal	1.003	365.242
sa	solar annual	long period	365.259	365.242

Next steps¶

We have explored calculating tide statistics and biases in EO data. Now we can learn how to validate modelled tides against measured tide gauge data to ensure the tides we are modelling are accurate.