Validating modelled tide heights¶

This guide demonstrates how to use the load_gauge_gesla function from eo_tides.validation to validate modelled tides from eo-tides using tide gauge data.

The tide models used by eo-tides can vary significantly in accuracy across the world's coastlines. Evaluating the accuracy of your modelled tides is critical for ensuring that resulting marine or coastal EO analyses are reliable and useful.

The load_gauge_gesla function provides a convenient tool for loading high-quality sea-level measurements from the GESLA Global Extreme Sea Level Analysis archive – a global archive of almost 90,713 years of sea level data from 5,119 records across the world. This data can be used to compare against tides modelled using eo-tides to calculate the accuracy of your tide modelling and identify the optimal tide models to use for your study area.

Getting started¶

As in the previous examples, our first step is to tell eo-tides the location of our tide model directory (if you haven't set this up, refer to the setup instructions here):

In [1]:

Copied!

directory = "../../tests/data/tide_models/"
directory = "../../tests/data/tide_models/"

Example modelled tides¶

First, we can model hourly tides for a location (Broome, Western Australia) and time period (January 2018) of interest using the eo_tides.model.model_tides function:

In [2]:

Copied!





import pandas as pd

from eo_tides.model import model_tides

x, y = 122.2186, -18.0008
start_time = "2018-01-01"
end_time = "2018-01-31"

modelled_df = model_tides(
    x=x,
    y=y,
    time=pd.date_range(start=start_time, end=end_time, freq="1h"),
    directory=directory,
)

# Print outputs
modelled_df.head()
import pandas as pd

from eo_tides.model import model_tides

x, y = 122.2186, -18.0008
start_time = "2018-01-01"
end_time = "2018-01-31"

modelled_df = model_tides(
    x=x,
    y=y,
    time=pd.date_range(start=start_time, end=end_time, freq="1h"),
    directory=directory,
)

# Print outputs
modelled_df.head()

Modelling tides with EOT20

Out[2]:

			tide_model	tide_height
time	x	y
2018-01-01 00:00:00	122.2186	-18.0008	EOT20	1.225622
2018-01-01 01:00:00	122.2186	-18.0008	EOT20	2.159491
2018-01-01 02:00:00	122.2186	-18.0008	EOT20	2.474313
2018-01-01 03:00:00	122.2186	-18.0008	EOT20	2.111740
2018-01-01 04:00:00	122.2186	-18.0008	EOT20	1.182326

Loading GESLA tide gauge data¶

To evaluate the accuracy of these modelled tides, we can load measured sea-level data from the nearest GESLA tide gauge using load_gauge_gesla.

To obtain GESLA data, you will need to download both "GESLA-3 DATA" (GESLA3.0_ALL.zip) and "GESLA-3 CSV META-DATA FILE" (GESLA3_ALL 2.csv) from the Downloads page of the GESLA website, and save and extract these to a convenient location.

We have provided an example below; replace these paths to point to your downloaded data.

In [3]:

Copied!

gesla_data_path = "../../tests/data/GESLA3.0_ALL/"
gesla_metadata_path = "../../tests/data/GESLA3_ALL 2.csv"
gesla_data_path = "../../tests/data/GESLA3.0_ALL/"
gesla_metadata_path = "../../tests/data/GESLA3_ALL 2.csv"

To load GESLA measured sea-level data for our location, we can pass in the same x and y location and time period that we used to originally model our tides. This will ensure that we load only the gauge data we actually need.

Tip

The load_gauge_gesla function will automatically identify the nearest GESLA tide gauge to an x, y coordinate; pass a set of bounding box tuples (e.g. x=(120, 130), y=(-20, -30) to return all tide gauges within a bounding box instead.

In [4]:

Copied!





from eo_tides.validation import load_gauge_gesla

# Load gauge data
gauge_df = load_gauge_gesla(
    x=x,
    y=y,
    time=(start_time, end_time),
    correct_mean=True,
    data_path=gesla_data_path,
    metadata_path=gesla_metadata_path,
)
gauge_df.head()
from eo_tides.validation import load_gauge_gesla

# Load gauge data
gauge_df = load_gauge_gesla(
    x=x,
    y=y,
    time=(start_time, end_time),
    correct_mean=True,
    data_path=gesla_data_path,
    metadata_path=gesla_metadata_path,
)
gauge_df.head()

Loading GESLA gauges: 100%|██████████| 1/1 [00:00<00:00, 46.17it/s]

Out[4]:

		sea_level	qc_flag	use_flag	file_name	site_name	country	contributor_abbreviated	contributor_full	contributor_website	contributor_contact	...	start_date_time	end_date_time	number_of_years	time_zone_hours	datum_information	instrument	precision	null_value	gauge_type	overall_record_quality
site_code	time
62650	2018-01-01 00:00:00	1.208329	1	1	../../tests/data/GESLA3.0_ALL/broome-62650-aus...	Broome	AUS	BOM	Bureau of Meteorology	http://www.bom.gov.au/oceanography/projects/nt...	tides@bom.gov.au	...	2/07/1966 0:00	31/12/2019 23:00	51	0	Chart Datum / Lowest Astronomical Tide	Unspecified	Unspecified	-99.9999	Coastal	No obvious issues
	2018-01-01 01:00:00	2.311329	1	1	../../tests/data/GESLA3.0_ALL/broome-62650-aus...	Broome	AUS	BOM	Bureau of Meteorology	http://www.bom.gov.au/oceanography/projects/nt...	tides@bom.gov.au	...	2/07/1966 0:00	31/12/2019 23:00	51	0	Chart Datum / Lowest Astronomical Tide	Unspecified	Unspecified	-99.9999	Coastal	No obvious issues
	2018-01-01 02:00:00	2.712329	1	1	../../tests/data/GESLA3.0_ALL/broome-62650-aus...	Broome	AUS	BOM	Bureau of Meteorology	http://www.bom.gov.au/oceanography/projects/nt...	tides@bom.gov.au	...	2/07/1966 0:00	31/12/2019 23:00	51	0	Chart Datum / Lowest Astronomical Tide	Unspecified	Unspecified	-99.9999	Coastal	No obvious issues
	2018-01-01 03:00:00	2.137329	1	1	../../tests/data/GESLA3.0_ALL/broome-62650-aus...	Broome	AUS	BOM	Bureau of Meteorology	http://www.bom.gov.au/oceanography/projects/nt...	tides@bom.gov.au	...	2/07/1966 0:00	31/12/2019 23:00	51	0	Chart Datum / Lowest Astronomical Tide	Unspecified	Unspecified	-99.9999	Coastal	No obvious issues
	2018-01-01 04:00:00	1.049329	1	1	../../tests/data/GESLA3.0_ALL/broome-62650-aus...	Broome	AUS	BOM	Bureau of Meteorology	http://www.bom.gov.au/oceanography/projects/nt...	tides@bom.gov.au	...	2/07/1966 0:00	31/12/2019 23:00	51	0	Chart Datum / Lowest Astronomical Tide	Unspecified	Unspecified	-99.9999	Coastal	No obvious issues

5 rows × 26 columns

We have successfully loaded data for the Broome tide gauge (GESLA site code 62650)! We can now plot sea levels over time - note that the gauge dataset is missing some data in late January 2018:

In [5]:

Copied!

gauge_df.droplevel("site_code").sea_level.plot()
gauge_df.droplevel("site_code").sea_level.plot()

Out[5]:

<Axes: xlabel='time'>

No description has been provided for this image

Validation against GESLA tide gauges¶

Now we have modelled some tides and loaded some measured sea-level data, we can compare them. Note that because the timeseries above is missing some data in late January, we need to "join" our modelled modelled_df data to the timesteps present in gauge_df.

Now let's generate a scatterplot with our measured data on the x-axis, and our modelled tides on the y-axis:

In [6]:

Copied!





import matplotlib.pyplot as plt

# Join our modelled data to the timesteps in our gauge data
joined_df = gauge_df.join(modelled_df).dropna()

# Plot as a scatterplot with 1:1 line
ax = joined_df.plot.scatter(x="sea_level", y="tide_height")
plt.plot([-5, 5], [-5, 5], c="red", linestyle="dashed")
ax.set_aspect(1.0)
ax.set_ylabel("Modelled tide (m)")
ax.set_xlabel("GESLA sea level (m)");
import matplotlib.pyplot as plt

# Join our modelled data to the timesteps in our gauge data
joined_df = gauge_df.join(modelled_df).dropna()

# Plot as a scatterplot with 1:1 line
ax = joined_df.plot.scatter(x="sea_level", y="tide_height")
plt.plot([-5, 5], [-5, 5], c="red", linestyle="dashed")
ax.set_aspect(1.0)
ax.set_ylabel("Modelled tide (m)")
ax.set_xlabel("GESLA sea level (m)");

We can see that both datasets are highly correlated. To quantify this, we can use the eo_tides.validation.eval_metrics function to compare them and calculate some useful accuracy statistics, including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), R-squared and bias.

Our results show that our modelled tides closely reproduced observed sea levels at this location:

In [7]:

Copied!

from eo_tides.validation import eval_metrics

# Calculate accuracy metrics
accuracy_metrics = eval_metrics(x=joined_df.sea_level, y=joined_df.tide_height)
accuracy_metrics
from eo_tides.validation import eval_metrics

# Calculate accuracy metrics
accuracy_metrics = eval_metrics(x=joined_df.sea_level, y=joined_df.tide_height)
accuracy_metrics

Out[7]:

Correlation         0.997
RMSE                0.160
MAE                 0.126
R-squared           0.994
Bias               -0.005
Regression slope    0.981
dtype: float64

Identifying best local tide models¶

Because different ocean tide models can perform better or worse in different locations, it can be valuable to compare the accuracy of different models against measured gauge data. This can help us make an informed decision about the best model to use for a given application or study area.

In the example below, we will use model_tides to model tides using three different models: EOT20, GOT5.5, and HAMTIDE11:

In [8]:

Copied!





models = ["EOT20", "GOT5.5", "HAMTIDE11"]

modelled_df = model_tides(
    x=x,
    y=y,
    time=pd.date_range(start=start_time, end=end_time, freq="1h"),
    model=models,
    output_format="wide",
    directory=directory,
)
modelled_df.head()
models = ["EOT20", "GOT5.5", "HAMTIDE11"]

modelled_df = model_tides(
    x=x,
    y=y,
    time=pd.date_range(start=start_time, end=end_time, freq="1h"),
    model=models,
    output_format="wide",
    directory=directory,
)
modelled_df.head()

Modelling tides with EOT20, GOT5.5, HAMTIDE11 in parallel (models: 3, splits: 1)

100%|██████████| 3/3 [00:00<00:00, 23.34it/s]

Converting to a wide format dataframe

Out[8]:

		tide_model	EOT20	GOT5.5	HAMTIDE11
time	x	y
2018-01-01 00:00:00	122.2186	-18.0008	1.225622	1.298427	1.419364
2018-01-01 01:00:00	122.2186	-18.0008	2.159491	2.287205	2.298982
2018-01-01 02:00:00	122.2186	-18.0008	2.474313	2.618187	2.535022
2018-01-01 03:00:00	122.2186	-18.0008	2.111740	2.228044	2.072335
2018-01-01 04:00:00	122.2186	-18.0008	1.182326	1.241291	1.035950

We can now merge these modelled tides with our measured gauge data:

In [9]:

Copied!





# Join our modelled data to the timesteps in our gauge data
joined_df = gauge_df.join(modelled_df).dropna()

# Plot measured sea levels and modelled data
columns = ["sea_level", *models]
joined_df.droplevel(["site_code", "x", "y"]).filter(columns).plot()
# Join our modelled data to the timesteps in our gauge data
joined_df = gauge_df.join(modelled_df).dropna()

# Plot measured sea levels and modelled data
columns = ["sea_level", *models]
joined_df.droplevel(["site_code", "x", "y"]).filter(columns).plot()

Out[9]:

<Axes: xlabel='time'>

Now, we can loop through each of our models and calculate accuracy metrics compared to our gauge data for each of them:

In [10]:

Copied!





# Calculate accuracy metrics for each model
accuracy_dict = {}
for model in models:
    accuracy_dict[model] = eval_metrics(x=joined_df.sea_level, y=joined_df[model])

# Merge into a single dataframe
combined_accuracy_df = pd.DataFrame.from_dict(accuracy_dict)
combined_accuracy_df
# Calculate accuracy metrics for each model
accuracy_dict = {}
for model in models:
    accuracy_dict[model] = eval_metrics(x=joined_df.sea_level, y=joined_df[model])

# Merge into a single dataframe
combined_accuracy_df = pd.DataFrame.from_dict(accuracy_dict)
combined_accuracy_df

Out[10]:

	EOT20	GOT5.5	HAMTIDE11
Correlation	0.997	0.997	0.993
RMSE	0.160	0.151	0.236
MAE	0.126	0.118	0.190
R-squared	0.994	0.994	0.986
Bias	-0.005	-0.010	-0.011
Regression slope	0.981	0.991	0.965

As we can see above, at this location GOT5.5 has the best overall accuracy as measured by RMSE and MAE, while results from HAMTIDE11 are less accurate and slightly less correlated with our measured gauge data.