Skip to content

API reference

eo_tides.model

Functions:

Name Description
ensemble_tides

Combine multiple tide models into a single locally optimised

model_phases

Model tide phases (low-flow, high-flow, high-ebb, low-ebb)

model_tides

Model tide heights at multiple coordinates and/or timesteps

ensemble_tides

ensemble_tides(
    tide_df,
    crs,
    ensemble_models,
    ensemble_func=None,
    ensemble_top_n=3,
    ranking_points="https://dea-public-data-dev.s3-ap-southeast-2.amazonaws.com/derivative/dea_intertidal/supplementary/rankings_ensemble_2017-2019.fgb",
    ranking_valid_perc=0.02,
    **idw_kwargs
)

Combine multiple tide models into a single locally optimised ensemble tide model using external model ranking data (e.g. satellite altimetry or NDWI-tide correlations along the coastline) to inform the selection of the best local models.

This function performs the following steps:

  1. Takes a dataframe of tide heights from multiple tide models, as produced by eo_tides.model.model_tides
  2. Loads model ranking points from an external file, filters them based on the valid data percentage, and retains relevant columns
  3. Interpolates the model rankings into the coordinates of the original dataframe using Inverse Weighted Interpolation (IDW)
  4. Uses rankings to combine multiple tide models into a single optimised ensemble model (by default, by taking the mean of the top 3 ranked models)
  5. Returns a new dataframe with the combined ensemble model predictions

Parameters:

Name Type Description Default

tide_df

DataFrame

DataFrame produced by eo_tides.model.model_tides, containing tide model predictions in long format with columns: ["time", "x", "y", "tide_height", "tide_model"].

required

crs

string

Coordinate reference system for the "x" and "y" coordinates in tide_df. Used to ensure that interpolations are performed in the correct CRS.

required

ensemble_models

list

A list of models to include in the ensemble modelling process. All values must exist as columns with the prefix "rank_" in ranking_points.

required

ensemble_func

dict

By default, a simple ensemble model will be calculated by taking the mean of the ensemble_top_n tide models at each location. However, a dictionary containing more complex ensemble calculations can also be provided. Dictionary keys are used to name output ensemble models; functions should take a column named "rank" and convert it to a weighting, e.g.: ensemble_func = {"ensemble-custom": lambda x: x["rank"] <= 3}

None

ensemble_top_n

int

If ensemble_func is None, this sets the number of top models to include in the mean ensemble calculation. Defaults to 3.

3

ranking_points

str

Path to the file containing model ranking points. This dataset should include columns containing rankings for each tide model, named with the prefix "rank_". e.g. "rank_EOT20". Low values should represent high rankings (e.g. 1 = top ranked). The default value points to an example file covering Australia.

'https://dea-public-data-dev.s3-ap-southeast-2.amazonaws.com/derivative/dea_intertidal/supplementary/rankings_ensemble_2017-2019.fgb'

ranking_valid_perc

float

Minimum percentage of valid data required to include a model rank point in the analysis, as defined in a column named "valid_perc". Defaults to 0.02.

0.02

**idw_kwargs

Optional keyword arguments to pass to the idw function used for interpolation. Useful values include k (number of nearest neighbours to use in interpolation), max_dist (maximum distance to nearest neighbours), and k_min (minimum number of neighbours required after max_dist is applied).

{}

Returns:

Type Description
DataFrame

DataFrame containing the ensemble model predictions, matching the format of the input tide_df (e.g. columns ["time", "x", "y", "tide_height", "tide_model"]. By default the 'tide_model' column will be labeled "ensemble" for the combined model predictions (but if a custom dictionary of ensemble functions is provided via ensemble_func, each ensemble will be named using the provided dictionary keys).

Source code in eo_tides/model.py
def ensemble_tides(
    tide_df,
    crs,
    ensemble_models,
    ensemble_func=None,
    ensemble_top_n=3,
    ranking_points="https://dea-public-data-dev.s3-ap-southeast-2.amazonaws.com/derivative/dea_intertidal/supplementary/rankings_ensemble_2017-2019.fgb",
    ranking_valid_perc=0.02,
    **idw_kwargs,
):
    """Combine multiple tide models into a single locally optimised
    ensemble tide model using external model ranking data (e.g.
    satellite altimetry or NDWI-tide correlations along the coastline)
    to inform the selection of the best local models.

    This function performs the following steps:

    1. Takes a dataframe of tide heights from multiple tide models, as
       produced by `eo_tides.model.model_tides`
    2. Loads model ranking points from an external file, filters them
       based on the valid data percentage, and retains relevant columns
    3. Interpolates the model rankings into the coordinates of the
       original dataframe using Inverse Weighted Interpolation (IDW)
    4. Uses rankings to combine multiple tide models into a single
       optimised ensemble model (by default, by taking the mean of the
       top 3 ranked models)
    5. Returns a new dataframe with the combined ensemble model predictions

    Parameters
    ----------
    tide_df : pandas.DataFrame
        DataFrame produced by `eo_tides.model.model_tides`, containing
        tide model predictions in long format with columns:
        `["time", "x", "y", "tide_height", "tide_model"]`.
    crs : string
        Coordinate reference system for the "x" and "y" coordinates in
        `tide_df`. Used to ensure that interpolations are performed
        in the correct CRS.
    ensemble_models : list
        A list of models to include in the ensemble modelling process.
        All values must exist as columns with the prefix "rank_" in
        `ranking_points`.
    ensemble_func : dict, optional
        By default, a simple ensemble model will be calculated by taking
        the mean of the `ensemble_top_n` tide models at each location.
        However, a dictionary containing more complex ensemble
        calculations can also be provided. Dictionary keys are used
        to name output ensemble models; functions should take a column
        named "rank" and convert it to a weighting, e.g.:
        `ensemble_func = {"ensemble-custom": lambda x: x["rank"] <= 3}`
    ensemble_top_n : int, optional
        If `ensemble_func` is None, this sets the number of top models
        to include in the mean ensemble calculation. Defaults to 3.
    ranking_points : str, optional
        Path to the file containing model ranking points. This dataset
        should include columns containing rankings for each tide
        model, named with the prefix "rank_". e.g. "rank_EOT20".
        Low values should represent high rankings (e.g. 1 = top ranked).
        The default value points to an example file covering Australia.
    ranking_valid_perc : float, optional
        Minimum percentage of valid data required to include a model
        rank point in the analysis, as defined in a column named
        "valid_perc". Defaults to 0.02.
    **idw_kwargs
        Optional keyword arguments to pass to the `idw` function used
        for interpolation. Useful values include `k` (number of nearest
        neighbours to use in interpolation), `max_dist` (maximum
        distance to nearest neighbours), and `k_min` (minimum number of
        neighbours required after `max_dist` is applied).

    Returns
    -------
    pandas.DataFrame
        DataFrame containing the ensemble model predictions, matching
        the format of the input `tide_df` (e.g. columns `["time", "x",
        "y", "tide_height", "tide_model"]`. By default the 'tide_model'
        column will be labeled "ensemble" for the combined model
        predictions (but if a custom dictionary of ensemble functions is
        provided via `ensemble_func`, each ensemble will be named using
        the provided dictionary keys).

    """
    # Raise data if `tide_df` provided in wide format
    if "tide_model" not in tide_df:
        raise Exception(
            "`tide_df` does not contain the expected 'tide_model' and "
            "'tide_height' columns. Ensure that tides were modelled in "
            "long format (i.e. `output_format='long'` in `model_tides`)."
        )

    # Extract x and y coords from dataframe
    x = tide_df.index.get_level_values(level="x")
    y = tide_df.index.get_level_values(level="y")

    # Load model ranks points and reproject to same CRS as x and y
    model_ranking_cols = [f"rank_{m}" for m in ensemble_models]
    try:
        model_ranks_gdf = (
            gpd.read_file(ranking_points, engine="pyogrio")
            .to_crs(crs)
            .query(f"valid_perc > {ranking_valid_perc}")
            .dropna(how="all")[model_ranking_cols + ["geometry"]]
        )
    except KeyError:
        error_msg = f"""
        Not all of the expected "rank_" columns {model_ranking_cols} were
        found in the columns of the ranking points file ({ranking_points}).
        Consider passing a custom list of models using `ensemble_models`.
        """
        raise Exception(textwrap.dedent(error_msg).strip()) from None

    # Use points to interpolate model rankings into requested x and y
    id_kwargs_str = "" if idw_kwargs == {} else idw_kwargs
    print(f"Interpolating model rankings using IDW interpolation {id_kwargs_str}")
    ensemble_ranks_df = (
        # Run IDW interpolation on subset of ranking columns
        pd.DataFrame(
            idw(
                input_z=model_ranks_gdf[model_ranking_cols],
                input_x=model_ranks_gdf.geometry.x,
                input_y=model_ranks_gdf.geometry.y,
                output_x=x,
                output_y=y,
                **idw_kwargs,
            ),
            columns=model_ranking_cols,
        )
        .assign(x=x, y=y)
        # Drop any duplicates then melt columns into long format
        .drop_duplicates()
        .melt(id_vars=["x", "y"], var_name="tide_model", value_name="rank")
        # Remore "rank_" prefix to get plain model names
        .replace({"^rank_": ""}, regex=True)
        # Set index columns and rank across groups
        .set_index(["tide_model", "x", "y"])
        .groupby(["x", "y"])
        .rank()
    )

    # If no custom ensemble funcs are provided, use a default ensemble
    # calculation that takes the mean of the top N tide models
    if ensemble_func is None:
        ensemble_func = {"ensemble": lambda x: x["rank"] <= ensemble_top_n}

    # Create output list to hold computed ensemble model outputs
    ensemble_list = []

    # Loop through all provided ensemble generation functions
    for ensemble_n, ensemble_f in ensemble_func.items():
        print(f"Combining models into single {ensemble_n} model")

        # Join ranks to input tide data, compute weightings and group
        grouped = (
            # Add tide model as an index so we can join with model ranks
            tide_df.set_index("tide_model", append=True)
            .join(ensemble_ranks_df)
            # Add temp columns containing weightings and weighted values
            .assign(
                weights=ensemble_f,  # use custom func to compute weights
                weighted=lambda i: i.tide_height * i.weights,
            )
            # Groupby is specified in a weird order here as this seems
            # to be the easiest way to preserve correct index sorting
            .groupby(["x", "y", "time"])
        )

        # Use weightings to combine multiple models into single ensemble
        ensemble_df = (
            # Calculate weighted mean and convert back to dataframe
            grouped.weighted.sum()
            .div(grouped.weights.sum())
            .to_frame("tide_height")
            # Label ensemble model and ensure indexes are in expected order
            .assign(tide_model=ensemble_n)
            .reorder_levels(["time", "x", "y"], axis=0)
        )

        ensemble_list.append(ensemble_df)

    # Combine all ensemble models and return as a single dataframe
    return pd.concat(ensemble_list)

model_phases

model_phases(
    x,
    y,
    time,
    model="EOT20",
    directory=None,
    time_offset="15 min",
    return_tides=False,
    **model_tides_kwargs
)

Model tide phases (low-flow, high-flow, high-ebb, low-ebb) at multiple coordinates and/or timesteps using using one or more ocean tide models.

Ebb and low phases are calculated by running the eo_tides.model.model_tides function twice, once for the requested timesteps, and again after subtracting a small time offset (by default, 15 minutes). If tides increased over this period, they are assigned as "flow"; if they decreased, they are assigned as "ebb". Tides are considered "high" if equal or greater than 0 metres tide height, otherwise "low".

This function supports all parameters that are supported by model_tides.

Parameters:

Name Type Description Default

x

float or list of float

One or more x and y coordinates used to define the location at which to model tide phases. By default these coordinates should be lat/lon; use "crs" if they are in a custom coordinate reference system.

required

y

float or list of float

One or more x and y coordinates used to define the location at which to model tide phases. By default these coordinates should be lat/lon; use "crs" if they are in a custom coordinate reference system.

required

time

DatetimeLike

Times at which to model tide phases (in UTC). Accepts any format that can be converted by pandas.to_datetime(); e.g. np.ndarray[datetime64], pd.DatetimeIndex, pd.Timestamp, datetime.datetime and strings (e.g. "2020-01-01 23:00"). For example: time=pd.date_range(start="2000", end="2001", freq="5h")

required

model

str or list of str

The tide model (or list of models) to use to model tides. Defaults to "EOT20"; specify "all" to use all models available in directory. For a full list of available and supported models, run eo_tides.model.list_models.

'EOT20'

directory

str

The directory containing tide model data files. If no path is provided, this will default to the environment variable EO_TIDES_TIDE_MODELS if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by pyTMD (https://geoscienceaustralia.github.io/eo-tides/setup/).

None

time_offset

str

The time offset/delta used to generate a time series of offset tide heights required for phase calculation. Defeaults to "15 min"; can be any string passed to pandas.Timedelta.

'15 min'

return_tides

bool

Whether to return intermediate modelled tide heights as a "tide_height" column in the output dataframe. Defaults to False.

False

**model_tides_kwargs

Optional parameters passed to the eo_tides.model.model_tides function. Important parameters include output_format (e.g. whether to return results in wide or long format), crop (whether to crop tide model constituent files on-the-fly to improve performance) etc.

{}

Returns:

Type Description
DataFrame

A dataframe containing modelled tide phases.

Source code in eo_tides/model.py
def model_phases(
    x: float | list[float] | xr.DataArray,
    y: float | list[float] | xr.DataArray,
    time: DatetimeLike,
    model: str | list[str] = "EOT20",
    directory: str | os.PathLike | None = None,
    time_offset: str = "15 min",
    return_tides: bool = False,
    **model_tides_kwargs,
) -> pd.DataFrame:
    """
    Model tide phases (low-flow, high-flow, high-ebb, low-ebb)
    at multiple coordinates and/or timesteps using using one
    or more ocean tide models.

    Ebb and low phases are calculated by running the
    `eo_tides.model.model_tides` function twice, once for
    the requested timesteps, and again after subtracting a
    small time offset (by default, 15 minutes). If tides
    increased over this period, they are assigned as "flow";
    if they decreased, they are assigned as "ebb".
    Tides are considered "high" if equal or greater than 0
    metres tide height, otherwise "low".

    This function supports all parameters that are supported
    by `model_tides`.

    Parameters
    ----------
    x, y : float or list of float
        One or more x and y coordinates used to define
        the location at which to model tide phases. By default
        these coordinates should be lat/lon; use "crs" if they
        are in a custom coordinate reference system.
    time : DatetimeLike
        Times at which to model tide phases (in UTC). Accepts
        any format that can be converted by `pandas.to_datetime()`;
        e.g. np.ndarray[datetime64], pd.DatetimeIndex, pd.Timestamp,
        datetime.datetime and strings (e.g. "2020-01-01 23:00").
        For example: `time=pd.date_range(start="2000", end="2001", freq="5h")`
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        Defaults to "EOT20"; specify "all" to use all models available
        in `directory`. For a full list of available and supported models,
        run `eo_tides.model.list_models`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    time_offset: str, optional
        The time offset/delta used to generate a time series of
        offset tide heights required for phase calculation. Defeaults
        to "15 min"; can be any string passed to `pandas.Timedelta`.
    return_tides: bool, optional
        Whether to return intermediate modelled tide heights as a
        "tide_height" column in the output dataframe. Defaults to False.
    **model_tides_kwargs :
        Optional parameters passed to the `eo_tides.model.model_tides`
        function. Important parameters include `output_format` (e.g.
        whether to return results in wide or long format), `crop`
        (whether to crop tide model constituent files on-the-fly to
        improve performance) etc.

    Returns
    -------
    pandas.DataFrame
        A dataframe containing modelled tide phases.

    """

    # Pop output format and mode for special handling
    output_format = model_tides_kwargs.pop("output_format", "long")
    mode = model_tides_kwargs.pop("mode", "one-to-many")

    # Model tides
    tide_df = model_tides(
        x=x,
        y=y,
        time=time,
        model=model,
        directory=directory,
        **model_tides_kwargs,
    )

    # Model tides for a time 15 minutes prior to each previously
    # modelled satellite acquisition time. This allows us to compare
    # tide heights to see if they are rising or falling.
    pre_df = model_tides(
        x=x,
        y=y,
        time=time - pd.Timedelta(time_offset),
        model=model,
        directory=directory,
        **model_tides_kwargs,
    )

    # Compare tides computed for each timestep. If the previous tide
    # was higher than the current tide, the tide is 'ebbing'. If the
    # previous tide was lower, the tide is 'flowing'
    ebb_flow = (tide_df.tide_height < pre_df.tide_height.values).replace({True: "ebb", False: "flow"})

    # If tides are greater than 0, then "high", otherwise "low"
    high_low = (tide_df.tide_height >= 0).replace({True: "high", False: "low"})

    # Combine into one string and add to data
    tide_df["tide_phase"] = high_low.astype(str) + "-" + ebb_flow.astype(str)

    # Optionally convert to a wide format dataframe with a tide model in
    # each dataframe column
    if output_format == "wide":
        # Pivot into wide format with each time model as a column
        print("Converting to a wide format dataframe")
        tide_df = tide_df.pivot(columns="tide_model")

        # If in 'one-to-one' mode, reindex using our input time/x/y
        # values to ensure the output is sorted the same as our inputs
        if mode == "one-to-one":
            output_indices = pd.MultiIndex.from_arrays([time, x, y], names=["time", "x", "y"])
            tide_df = tide_df.reindex(output_indices)

        # Optionally drop tides
        if not return_tides:
            return tide_df.drop("tide_height", axis=1)["tide_phase"]

    # Optionally drop tide heights
    if not return_tides:
        return tide_df.drop("tide_height", axis=1)

    return tide_df

model_tides

model_tides(
    x,
    y,
    time,
    model="EOT20",
    directory=None,
    crs="EPSG:4326",
    mode="one-to-many",
    output_format="long",
    output_units="m",
    method="linear",
    extrapolate=True,
    cutoff=None,
    crop=True,
    crop_buffer=5,
    parallel=True,
    parallel_splits="auto",
    parallel_max=None,
    ensemble_models=None,
    **ensemble_kwargs
)

Model tide heights at multiple coordinates and/or timesteps using using one or more ocean tide models.

This function is parallelised to improve performance, and supports all tidal models supported by pyTMD, including:

  • Empirical Ocean Tide model (EOT20)
  • Finite Element Solution tide models (FES2022, FES2014, FES2012)
  • TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
  • Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
  • Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)

This function requires access to tide model data files. These should be placed in a folder with subfolders matching the structure required by pyTMD. For more details: https://geoscienceaustralia.github.io/eo-tides/setup/ https://pytmd.readthedocs.io/en/latest/getting_started/Getting-Started.html#directories

This function is a modification of the pyTMD package's pyTMD.compute.tide_elevations function. For more info: https://pytmd.readthedocs.io/en/latest/api_reference/compute.html#pyTMD.compute.tide_elevations

Parameters:

Name Type Description Default

x

float or list of float

One or more x and y coordinates used to define the location at which to model tides. By default these coordinates should be lat/lon; use "crs" if they are in a custom coordinate reference system.

required

y

float or list of float

One or more x and y coordinates used to define the location at which to model tides. By default these coordinates should be lat/lon; use "crs" if they are in a custom coordinate reference system.

required

time

DatetimeLike

Times at which to model tide heights (in UTC). Accepts any format that can be converted by pandas.to_datetime(); e.g. np.ndarray[datetime64], pd.DatetimeIndex, pd.Timestamp, datetime.datetime and strings (e.g. "2020-01-01 23:00"). For example: time=pd.date_range(start="2000", end="2001", freq="5h")

required

model

str or list of str

The tide model (or list of models) to use to model tides. Defaults to "EOT20"; specify "all" to use all models available in directory. For a full list of available and supported models, run eo_tides.model.list_models.

'EOT20'

directory

str

The directory containing tide model data files. If no path is provided, this will default to the environment variable EO_TIDES_TIDE_MODELS if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by pyTMD (https://geoscienceaustralia.github.io/eo-tides/setup/).

None

crs

str

Input coordinate reference system for x and y coordinates. Defaults to "EPSG:4326" (WGS84; degrees latitude, longitude).

'EPSG:4326'

mode

str

The analysis mode to use for tide modelling. Supports two options:

  • "one-to-many": Models tides for every timestep in "time" at every input x and y coordinate point. This is useful if you want to model tides for a specific list of timesteps across multiple spatial points (e.g. for the same set of satellite acquisition times at various locations across your study area).
  • "one-to-one": Model tides using a unique timestep for each set of x and y coordinates. In this mode, the number of x and y points must equal the number of timesteps provided in "time".
'one-to-many'

output_format

str

Whether to return the output dataframe in long format (with results stacked vertically along "tide_model" and "tide_height" columns), or wide format (with a column for each tide model). Defaults to "long".

'long'

output_units

str

Whether to return modelled tides in floating point metre units, or integer centimetre units (i.e. scaled by 100) or integer millimetre units (i.e. scaled by 1000. Returning outputs in integer units can be useful for reducing memory usage. Defaults to "m" for metres; set to "cm" for centimetres or "mm" for millimetres.

'm'

method

str

Method used to interpolate tidal constituents from model files. Defaults to "linear"; options include:

  • "linear", "nearest": scipy regular grid interpolations
  • "spline": scipy bivariate spline interpolation
  • "bilinear": quick bilinear interpolation
'linear'

extrapolate

bool

Whether to extrapolate tides into x and y coordinates outside of the valid tide modelling domain using nearest-neighbor.

True

cutoff

float

Extrapolation cutoff in kilometers. The default is None, which will extrapolate for all points regardless of distance from the valid tide modelling domain.

None

crop

bool

Whether to crop tide model constituent files on-the-fly to improve performance. Defaults to True; use crop_buffer to customise the buffer distance used to crop the files.

True

crop_buffer

int or float

The buffer distance in degrees used to crop tide model constituent files around the modelling area. Defaults to 5, which will crop constituents using a five degree buffer on either side of the analysis extent.

5

parallel

bool

Whether to parallelise tide modelling. If multiple tide models are requested, these will be run in parallel using concurrent.futures. If enough workers are available, the analysis will also be split into spatial chunks for additional parallelisation (see "parallel_splits" below). Default is True.

True

parallel_splits

str or int

Whether to split the input x and y coordinates into smaller, evenly-sized chunks that are processed in parallel. This can provide a large performance boost when processing large numbers of coordinates. The default is "auto", which will automatically attempt to determine optimal splits based on available CPUs, the number of input points, and the number of models.

'auto'

parallel_max

int

Maximum number of processes to run in parallel. The default of None will automatically determine this from your available CPUs.

None

ensemble_models

list of str

An optional list of models used to generate the ensemble tide model if "ensemble" tide modelling is requested. Defaults to ["EOT20", "FES2012", "FES2014_extrapolated", "FES2022_extrapolated", "GOT4.10", "GOT5.5_extrapolated", "GOT5.6_extrapolated", "TPXO10-atlas-v2-nc", "TPXO8-atlas-nc", "TPXO9-atlas-v5-nc"].

None

**ensemble_kwargs

Keyword arguments used to customise the generation of optional ensemble tide models if "ensemble" modelling are requested. These are passed to the underlying _ensemble_model function. Useful parameters include ranking_points (path to model rankings data), k (for controlling how model rankings are interpolated), and ensemble_top_n (how many top models to use in the ensemble calculation).

{}

Returns:

Type Description
DataFrame

A dataframe containing modelled tide heights.

Source code in eo_tides/model.py
def model_tides(
    x: float | list[float] | xr.DataArray,
    y: float | list[float] | xr.DataArray,
    time: DatetimeLike,
    model: str | list[str] = "EOT20",
    directory: str | os.PathLike | None = None,
    crs: str = "EPSG:4326",
    mode: str = "one-to-many",
    output_format: str = "long",
    output_units: str = "m",
    method: str = "linear",
    extrapolate: bool = True,
    cutoff: float | None = None,
    crop: bool = True,
    crop_buffer: float | None = 5,
    parallel: bool = True,
    parallel_splits: int | str = "auto",
    parallel_max: int | None = None,
    ensemble_models: list[str] | None = None,
    **ensemble_kwargs,
) -> pd.DataFrame:
    """
    Model tide heights at multiple coordinates and/or timesteps
    using using one or more ocean tide models.

    This function is parallelised to improve performance, and
    supports all tidal models supported by `pyTMD`, including:

    - Empirical Ocean Tide model (EOT20)
    - Finite Element Solution tide models (FES2022, FES2014, FES2012)
    - TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
    - Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
    - Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)

    This function requires access to tide model data files.
    These should be placed in a folder with subfolders matching
    the structure required by `pyTMD`. For more details:
    <https://geoscienceaustralia.github.io/eo-tides/setup/>
    <https://pytmd.readthedocs.io/en/latest/getting_started/Getting-Started.html#directories>

    This function is a modification of the `pyTMD` package's
    `pyTMD.compute.tide_elevations` function. For more info:
    <https://pytmd.readthedocs.io/en/latest/api_reference/compute.html#pyTMD.compute.tide_elevations>

    Parameters
    ----------
    x, y : float or list of float
        One or more x and y coordinates used to define
        the location at which to model tides. By default these
        coordinates should be lat/lon; use "crs" if they
        are in a custom coordinate reference system.
    time : DatetimeLike
        Times at which to model tide heights (in UTC). Accepts
        any format that can be converted by `pandas.to_datetime()`;
        e.g. np.ndarray[datetime64], pd.DatetimeIndex, pd.Timestamp,
        datetime.datetime and strings (e.g. "2020-01-01 23:00").
        For example: `time=pd.date_range(start="2000", end="2001", freq="5h")`
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        Defaults to "EOT20"; specify "all" to use all models available
        in `directory`. For a full list of available and supported models,
        run `eo_tides.model.list_models`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    crs : str, optional
        Input coordinate reference system for x and y coordinates.
        Defaults to "EPSG:4326" (WGS84; degrees latitude, longitude).
    mode : str, optional
        The analysis mode to use for tide modelling. Supports two options:

        - "one-to-many": Models tides for every timestep in "time" at
        every input x and y coordinate point. This is useful if you
        want to model tides for a specific list of timesteps across
        multiple spatial points (e.g. for the same set of satellite
        acquisition times at various locations across your study area).
        - "one-to-one": Model tides using a unique timestep for each
        set of x and y coordinates. In this mode, the number of x and
        y points must equal the number of timesteps provided in "time".
    output_format : str, optional
        Whether to return the output dataframe in long format (with
        results stacked vertically along "tide_model" and "tide_height"
        columns), or wide format (with a column for each tide model).
        Defaults to "long".
    output_units : str, optional
        Whether to return modelled tides in floating point metre units,
        or integer centimetre units (i.e. scaled by 100) or integer
        millimetre units (i.e. scaled by 1000. Returning outputs in
        integer units can be useful for reducing memory usage.
        Defaults to "m" for metres; set to "cm" for centimetres or "mm"
        for millimetres.
    method : str, optional
        Method used to interpolate tidal constituents
        from model files. Defaults to "linear"; options include:

        - "linear", "nearest": scipy regular grid interpolations
        - "spline": scipy bivariate spline interpolation
        - "bilinear": quick bilinear interpolation
    extrapolate : bool, optional
        Whether to extrapolate tides into x and y coordinates outside of
        the valid tide modelling domain using nearest-neighbor.
    cutoff : float, optional
        Extrapolation cutoff in kilometers. The default is None, which
        will extrapolate for all points regardless of distance from the
        valid tide modelling domain.
    crop : bool, optional
        Whether to crop tide model constituent files on-the-fly to
        improve performance. Defaults to True; use `crop_buffer`
        to customise the buffer distance used to crop the files.
    crop_buffer : int or float, optional
        The buffer distance in degrees used to crop tide model
        constituent files around the modelling area. Defaults to 5,
        which will crop constituents using a five degree buffer on either
        side of the analysis extent.
    parallel : bool, optional
        Whether to parallelise tide modelling. If multiple tide models are
        requested, these will be run in parallel using `concurrent.futures`.
        If enough workers are available, the analysis will also be split
        into spatial chunks for additional parallelisation (see "parallel_splits"
        below). Default is True.
    parallel_splits : str or int, optional
        Whether to split the input x and y coordinates into smaller,
        evenly-sized chunks that are processed in parallel. This can
        provide a large performance boost when processing large numbers
        of coordinates. The default is "auto", which will automatically
        attempt to determine optimal splits based on available CPUs,
        the number of input points, and the number of models.
    parallel_max : int, optional
        Maximum number of processes to run in parallel. The default of
        None will automatically determine this from your available CPUs.
    ensemble_models : list of str, optional
        An optional list of models used to generate the ensemble tide
        model if "ensemble" tide modelling is requested. Defaults to
        `["EOT20", "FES2012", "FES2014_extrapolated", "FES2022_extrapolated",
        "GOT4.10", "GOT5.5_extrapolated", "GOT5.6_extrapolated",
        "TPXO10-atlas-v2-nc", "TPXO8-atlas-nc", "TPXO9-atlas-v5-nc"]`.
    **ensemble_kwargs :
        Keyword arguments used to customise the generation of optional
        ensemble tide models if "ensemble" modelling are requested.
        These are passed to the underlying `_ensemble_model` function.
        Useful parameters include `ranking_points` (path to model
        rankings data), `k` (for controlling how model rankings are
        interpolated), and `ensemble_top_n` (how many top models to use
        in the ensemble calculation).

    Returns
    -------
    pandas.DataFrame
        A dataframe containing modelled tide heights.

    """
    # Turn inputs into arrays for consistent handling
    x = np.atleast_1d(x)
    y = np.atleast_1d(y)
    time = _standardise_time(time)

    # Validate input arguments
    assert time is not None, "Times for modelling tides must be provided via `time`."
    assert method in ("bilinear", "spline", "linear", "nearest")
    assert output_units in (
        "m",
        "cm",
        "mm",
    ), "Output units must be either 'm', 'cm', or 'mm'."
    assert output_format in (
        "long",
        "wide",
    ), "Output format must be either 'long' or 'wide'."
    assert np.issubdtype(x.dtype, np.number), "`x` must contain only valid numeric values, and must not be None."
    assert np.issubdtype(y.dtype, np.number), "`y` must contain only valid numeric values, and must not be None.."
    assert len(x) == len(y), "x and y must be the same length."
    if mode == "one-to-one":
        assert len(x) == len(time), (
            "The number of supplied x and y points and times must be "
            "identical in 'one-to-one' mode. Use 'one-to-many' mode if "
            "you intended to model multiple timesteps at each point."
        )

    # Set tide modelling files directory. If no custom path is
    # provided, try global environment variable.
    directory = _set_directory(directory)

    # Standardise model list, handling "all" and "ensemble" functionality
    models_to_process, models_requested, ensemble_models = _standardise_models(
        model=model,
        directory=directory,
        ensemble_models=ensemble_models,
    )

    # Update tide modelling func to add default keyword arguments that
    # are used for every iteration during parallel processing
    iter_func = partial(
        _model_tides,
        directory=directory,
        crs=crs,
        mode=mode,
        output_units=output_units,
        method=method,
        extrapolate=extrapolate,
        cutoff=np.inf if cutoff is None else cutoff,
        crop=crop,
        crop_buffer=crop_buffer,
    )

    # If automatic parallel splits, calculate optimal value
    # based on available parallelisation, number of points
    # and number of models
    if parallel_splits == "auto":
        parallel_splits = _parallel_splits(
            total_points=len(x),
            model_count=len(models_to_process),
            parallel_max=parallel_max,
        )

    # Verify that parallel splits are not larger than number of points
    assert isinstance(parallel_splits, int)
    if parallel_splits > len(x):
        raise ValueError(f"Parallel splits ({parallel_splits}) cannot be larger than the number of points ({len(x)}).")

    # Parallelise if either multiple models or multiple splits requested
    if parallel & ((len(models_to_process) > 1) | (parallel_splits > 1)):
        with ProcessPoolExecutor(max_workers=parallel_max) as executor:
            print(
                f"Modelling tides with {', '.join(models_to_process)} in parallel (models: {len(models_to_process)}, splits: {parallel_splits})"
            )

            # Optionally split lon/lat points into `splits_n` chunks
            # that will be applied in parallel
            x_split = np.array_split(x, parallel_splits)
            y_split = np.array_split(y, parallel_splits)

            # Get every combination of models and lat/lon points, and
            # extract as iterables that can be passed to `executor.map()`
            # In "one-to-many" mode, pass entire set of timesteps to each
            # parallel iteration by repeating timesteps by number of total
            # parallel iterations. In "one-to-one" mode, split up
            # timesteps into smaller parallel chunks too.
            if mode == "one-to-many":
                model_iters, x_iters, y_iters = zip(
                    *[(m, x_split[i], y_split[i]) for m in models_to_process for i in range(parallel_splits)],
                )
                time_iters = [time] * len(model_iters)
            elif mode == "one-to-one":
                time_split = np.array_split(time, parallel_splits)
                model_iters, x_iters, y_iters, time_iters = zip(
                    *[
                        (m, x_split[i], y_split[i], time_split[i])
                        for m in models_to_process
                        for i in range(parallel_splits)
                    ],
                )

            # Apply func in parallel, iterating through each input param
            try:
                model_outputs = list(
                    tqdm(
                        executor.map(iter_func, model_iters, x_iters, y_iters, time_iters),
                        total=len(model_iters),
                    ),
                )
            except BrokenProcessPool:
                error_msg = (
                    "Parallelised tide modelling failed, likely to to an out-of-memory error. "
                    "Try reducing the size of your analysis, or set `parallel=False`."
                )
                raise RuntimeError(error_msg)

    # Model tides in series if parallelisation is off
    else:
        model_outputs = []

        for model_i in models_to_process:
            print(f"Modelling tides with {model_i}")
            tide_df = iter_func(model_i, x, y, time)
            model_outputs.append(tide_df)

    # Combine outputs into a single dataframe
    tide_df = pd.concat(model_outputs, axis=0)

    # Optionally compute ensemble model and add to dataframe
    if "ensemble" in models_requested:
        ensemble_df = ensemble_tides(tide_df, crs, ensemble_models, **ensemble_kwargs)

        # Update requested models with any custom ensemble models, then
        # filter the dataframe to keep only models originally requested
        models_requested = list(np.union1d(models_requested, ensemble_df.tide_model.unique()))
        tide_df = pd.concat([tide_df, ensemble_df]).query("tide_model in @models_requested")

    # Optionally convert to a wide format dataframe with a tide model in
    # each dataframe column
    if output_format == "wide":
        # Pivot into wide format with each time model as a column
        print("Converting to a wide format dataframe")
        tide_df = tide_df.pivot(columns="tide_model", values="tide_height")

        # If in 'one-to-one' mode, reindex using our input time/x/y
        # values to ensure the output is sorted the same as our inputs
        if mode == "one-to-one":
            output_indices = pd.MultiIndex.from_arrays([time, x, y], names=["time", "x", "y"])
            tide_df = tide_df.reindex(output_indices)

    return tide_df

eo_tides.eo

Functions:

Name Description
pixel_tides

Model tide heights for every pixel in a multi-dimensional

tag_tides

Model tide heights for every timestep in a multi-dimensional

pixel_tides

pixel_tides(
    data,
    time=None,
    model="EOT20",
    directory=None,
    resample=True,
    calculate_quantiles=None,
    resolution=None,
    buffer=None,
    resample_method="bilinear",
    dask_chunks=None,
    dask_compute=True,
    **model_tides_kwargs
)

Model tide heights for every pixel in a multi-dimensional dataset, using one or more ocean tide models.

This function models tides into a low-resolution tide modelling grid covering the spatial extent of the input data (buffered to reduce potential edge effects). These modelled tides can then be resampled back into the original higher resolution dataset's extent and resolution to produce a modelled tide height for every pixel through time.

This function uses the parallelised model_tides function under the hood. It supports all tidal models supported by pyTMD, including:

  • Empirical Ocean Tide model (EOT20)
  • Finite Element Solution tide models (FES2022, FES2014, FES2012)
  • TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
  • Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
  • Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)

This function requires access to tide model data files. These should be placed in a folder with subfolders matching the structure required by pyTMD. For more details: https://geoscienceaustralia.github.io/eo-tides/setup/ https://pytmd.readthedocs.io/en/latest/getting_started/Getting-Started.html#directories

Parameters:

Name Type Description Default

data

Dataset or DataArray or GeoBox

A multi-dimensional dataset or GeoBox pixel grid that will be used to define the spatial tide modelling grid. If data is an xarray object, it should include a "time" dimension. If no "time" dimension exists or if data is a GeoBox, then times must be passed using the time parameter.

required

time

DatetimeLike

By default, tides will be modelled using times from the "time" dimension of data. Alternatively, this param can be used to provide a custom set of times. Accepts any format that can be converted by pandas.to_datetime(). For example: time=pd.date_range(start="2000", end="2001", freq="5h")

None

model

str or list of str

The tide model (or list of models) to use to model tides. If a list is provided, a new "tide_model" dimension will be added to the xarray.DataArray outputs. Defaults to "EOT20"; specify "all" to use all models available in directory. For a full list of available and supported models, run eo_tides.model.list_models.

'EOT20'

directory

str

The directory containing tide model data files. If no path is provided, this will default to the environment variable EO_TIDES_TIDE_MODELS if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by pyTMD (https://geoscienceaustralia.github.io/eo-tides/setup/).

None

resample

bool

Whether to resample low resolution tides back into data's original higher resolution grid. Set this to False if you do not want low resolution tides to be re-projected back to higher resolution.

True

calculate_quantiles

tuple of float or numpy.ndarray

Rather than returning all individual tides, low-resolution tides can be first aggregated using a quantile calculation by passing in a tuple or array of quantiles to compute. For example, this could be used to calculate the min/max tide across all times: calculate_quantiles=(0.0, 1.0).

None

resolution

float

The desired resolution of the low-resolution grid used for tide modelling. The default None will create a 5000 m resolution grid if data has a projected CRS (i.e. metre units), or a 0.05 degree resolution grid if data has a geographic CRS (e.g. degree units). Note: higher resolutions do not necessarily provide better tide modelling performance, as results will be limited by the resolution of the underlying global tide model (e.g. 1/16th degree / ~5 km resolution grid for FES2014).

None

buffer

float

The amount by which to buffer the higher resolution grid extent when creating the new low resolution grid. This buffering ensures that modelled tides are seamless across analysis boundaries. This buffer is eventually be clipped away when the low-resolution modelled tides are re-projected back to the original resolution and extent of data. To ensure that at least two low-resolution grid pixels occur outside of the dataset bounds, the default None applies a 12000 m buffer if data has a projected CRS (i.e. metre units), or a 0.12 degree buffer if data has a geographic CRS (e.g. degree units).

None

resample_method

str

If resampling is requested (see resample above), use this resampling method when resampling from low resolution to high resolution pixels. Defaults to "bilinear"; valid options include "nearest", "cubic", "min", "max", "average" etc.

'bilinear'

dask_chunks

tuple of float

Can be used to configure custom Dask chunking for the final resampling step. By default, chunks will be automatically set to match y/x chunks from data if they exist; otherwise chunks will be chosen to cover the entire y/x extent of the dataset. For custom chunks, provide a tuple in the form (y, x), e.g. (2048, 2048).

None

dask_compute

bool

Whether to compute results of the resampling step using Dask. If False, tides_highres will be returned as a Dask-enabled array.

True

**model_tides_kwargs

Optional parameters passed to the eo_tides.model.model_tides function. Important parameters include cutoff (used to extrapolate modelled tides away from the coast; defaults to np.inf), crop (whether to crop tide model constituent files on-the-fly to improve performance) etc.

{}

Returns:

Name Type Description
tides_da DataArray

A three-dimensional tide height array. If resample=True (default), a high-resolution array of tide heights will be returned that matches the exact spatial resolution and extents of data. This will contain either tide heights for every timestep in data (or in times if provided), or tide height quantiles for every quantile provided by calculate_quantiles. If resample=False, results for the intermediate low-resolution tide modelling grid will be returned instead.

Source code in eo_tides/eo.py
def pixel_tides(
    data: xr.Dataset | xr.DataArray | GeoBox,
    time: DatetimeLike | None = None,
    model: str | list[str] = "EOT20",
    directory: str | os.PathLike | None = None,
    resample: bool = True,
    calculate_quantiles: np.ndarray | tuple[float, float] | None = None,
    resolution: float | None = None,
    buffer: float | None = None,
    resample_method: str = "bilinear",
    dask_chunks: tuple[float, float] | None = None,
    dask_compute: bool = True,
    **model_tides_kwargs,
) -> xr.DataArray:
    """
    Model tide heights for every pixel in a multi-dimensional
    dataset, using one or more ocean tide models.

    This function models tides into a low-resolution tide
    modelling grid covering the spatial extent of the input
    data (buffered to reduce potential edge effects). These
    modelled tides can then be resampled back into the original
    higher resolution dataset's extent and resolution to
    produce a modelled tide height for every pixel through time.

    This function uses the parallelised `model_tides` function
    under the hood. It supports all tidal models supported by
    `pyTMD`, including:

    - Empirical Ocean Tide model (EOT20)
    - Finite Element Solution tide models (FES2022, FES2014, FES2012)
    - TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
    - Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
    - Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)

    This function requires access to tide model data files.
    These should be placed in a folder with subfolders matching
    the structure required by `pyTMD`. For more details:
    <https://geoscienceaustralia.github.io/eo-tides/setup/>
    <https://pytmd.readthedocs.io/en/latest/getting_started/Getting-Started.html#directories>

    Parameters
    ----------
    data : xarray.Dataset or xarray.DataArray or odc.geo.geobox.GeoBox
        A multi-dimensional dataset or GeoBox pixel grid that will
        be used to define the spatial tide modelling grid. If `data`
        is an xarray object, it should include a "time" dimension.
        If no "time" dimension exists or if `data` is a GeoBox,
        then times must be passed using the `time` parameter.
    time : DatetimeLike, optional
        By default, tides will be modelled using times from the
        "time" dimension of `data`. Alternatively, this param can
        be used to provide a custom set of times. Accepts any format
        that can be converted by `pandas.to_datetime()`. For example:
        `time=pd.date_range(start="2000", end="2001", freq="5h")`
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        If a list is provided, a new "tide_model" dimension will be
        added to the `xarray.DataArray` outputs. Defaults to "EOT20";
        specify "all" to use all models available in `directory`.
        For a full list of available and supported models, run
        `eo_tides.model.list_models`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    resample : bool, optional
        Whether to resample low resolution tides back into `data`'s original
        higher resolution grid. Set this to `False` if you do not want
        low resolution tides to be re-projected back to higher resolution.
    calculate_quantiles : tuple of float or numpy.ndarray, optional
        Rather than returning all individual tides, low-resolution tides
        can be first aggregated using a quantile calculation by passing in
        a tuple or array of quantiles to compute. For example, this could
        be used to calculate the min/max tide across all times:
        `calculate_quantiles=(0.0, 1.0)`.
    resolution : float, optional
        The desired resolution of the low-resolution grid used for tide
        modelling. The default None will create a 5000 m resolution grid
        if `data` has a projected CRS (i.e. metre units), or a 0.05 degree
        resolution grid if `data` has a geographic CRS (e.g. degree units).
        Note: higher resolutions do not necessarily provide better
        tide modelling performance, as results will be limited by the
        resolution of the underlying global tide model (e.g. 1/16th
        degree / ~5 km resolution grid for FES2014).
    buffer : float, optional
        The amount by which to buffer the higher resolution grid extent
        when creating the new low resolution grid. This buffering
        ensures that modelled tides are seamless across analysis
        boundaries. This buffer is eventually be clipped away when
        the low-resolution modelled tides are re-projected back to the
        original resolution and extent of `data`. To ensure that at least
        two low-resolution grid pixels occur outside of the dataset
        bounds, the default None applies a 12000 m buffer if `data` has a
        projected CRS (i.e. metre units), or a 0.12 degree buffer if
        `data` has a geographic CRS (e.g. degree units).
    resample_method : str, optional
        If resampling is requested (see `resample` above), use this
        resampling method when resampling from low resolution to high
        resolution pixels. Defaults to "bilinear"; valid options include
        "nearest", "cubic", "min", "max", "average" etc.
    dask_chunks : tuple of float, optional
        Can be used to configure custom Dask chunking for the final
        resampling step. By default, chunks will be automatically set
        to match y/x chunks from `data` if they exist; otherwise chunks
        will be chosen to cover the entire y/x extent of the dataset.
        For custom chunks, provide a tuple in the form `(y, x)`, e.g.
        `(2048, 2048)`.
    dask_compute : bool, optional
        Whether to compute results of the resampling step using Dask.
        If False, `tides_highres` will be returned as a Dask-enabled array.
    **model_tides_kwargs :
        Optional parameters passed to the `eo_tides.model.model_tides`
        function. Important parameters include `cutoff` (used to
        extrapolate modelled tides away from the coast; defaults to
        `np.inf`), `crop` (whether to crop tide model constituent files
        on-the-fly to improve performance) etc.
    Returns
    -------
    tides_da : xr.DataArray
        A three-dimensional tide height array.
        If `resample=True` (default), a high-resolution array of tide
        heights will be returned that matches the exact spatial resolution
        and extents of `data`. This will contain either tide heights for
        every timestep in `data` (or in `times` if provided), or tide height
        quantiles for every quantile provided by `calculate_quantiles`.
        If `resample=False`, results for the intermediate low-resolution
        tide modelling grid will be returned instead.
    """
    # Standardise data inputs, time and models
    gbox, time_coords = _standardise_inputs(data, time)
    dask_chunks = _resample_chunks(data, dask_chunks)
    model = [model] if isinstance(model, str) else model

    # Determine spatial dimensions
    y_dim, x_dim = gbox.dimensions

    # Determine resolution and buffer, using different defaults for
    # geographic (i.e. degrees) and projected (i.e. metres) CRSs:
    assert gbox.crs is not None
    crs_units = gbox.crs.units[0][0:6]
    if gbox.crs.geographic:
        if resolution is None:
            resolution = 0.05
        elif resolution > 360:
            raise ValueError(
                f"A resolution of greater than 360 was "
                f"provided, but `data` has a geographic CRS "
                f"in {crs_units} units. Did you accidently "
                f"provide a resolution in projected "
                f"(i.e. metre) units?",
            )
        if buffer is None:
            buffer = 0.12
    else:
        if resolution is None:
            resolution = 5000
        elif resolution < 1:
            raise ValueError(
                f"A resolution of less than 1 was provided, "
                f"but `data` has a projected CRS in "
                f"{crs_units} units. Did you accidently "
                f"provide a resolution in geographic "
                f"(degree) units?",
            )
        if buffer is None:
            buffer = 12000

    # Raise error if resolution is less than dataset resolution
    dataset_res = gbox.resolution.x
    if resolution < dataset_res:
        raise ValueError(
            f"The resolution of the low-resolution tide "
            f"modelling grid ({resolution:.2f}) is less "
            f"than `data`'s pixel resolution ({dataset_res:.2f}). "
            f"This can cause extremely slow tide modelling "
            f"performance. Please select provide a resolution "
            f"greater than {dataset_res:.2f} using "
            f"`pixel_tides`'s 'resolution' parameter.",
        )

    # Create a new reduced resolution tide modelling grid after
    # first buffering the grid
    print(f"Creating reduced resolution {resolution} x {resolution} {crs_units} tide modelling array")
    buffered_geobox = gbox.buffered(buffer)
    rescaled_geobox = GeoBox.from_bbox(bbox=buffered_geobox.boundingbox, resolution=resolution)
    rescaled_ds = odc.geo.xr.xr_zeros(rescaled_geobox)

    # Flatten grid to 1D, then add time dimension
    flattened_ds = rescaled_ds.stack(z=(x_dim, y_dim))
    flattened_ds = flattened_ds.expand_dims(dim={"time": time_coords})

    # Model tides in parallel, returning a pandas.DataFrame
    tide_df = model_tides(
        x=flattened_ds[x_dim],
        y=flattened_ds[y_dim],
        time=flattened_ds.time,
        crs=f"EPSG:{gbox.crs.epsg}",
        model=model,
        directory=directory,
        **model_tides_kwargs,
    )

    # Convert our pandas.DataFrame tide modelling outputs to xarray
    tides_lowres = (
        # Rename x and y dataframe indexes to match x and y xarray dims
        tide_df.rename_axis(["time", x_dim, y_dim])
        # Add tide model column to dataframe indexes so we can convert
        # our dataframe to a multidimensional xarray
        .set_index("tide_model", append=True)
        # Convert to xarray and select our tide modelling xr.DataArray
        .to_xarray()
        .tide_height
        # Re-index and transpose into our input coordinates and dim order
        .reindex_like(rescaled_ds)
        .transpose("tide_model", "time", y_dim, x_dim)
    )

    # Optionally calculate and return quantiles rather than raw data.
    # Set dtype to dtype of the input data as quantile always returns
    # float64 (memory intensive)
    if calculate_quantiles is not None:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            print("Computing tide quantiles")
            tides_lowres = tides_lowres.quantile(q=calculate_quantiles, dim="time").astype(tides_lowres.dtype)

    # If only one tidal model exists, squeeze out "tide_model" dim
    if len(tides_lowres.tide_model) == 1:
        tides_lowres = tides_lowres.squeeze("tide_model")

    # Ensure CRS is present before we apply any resampling
    tides_lowres = tides_lowres.odc.assign_crs(gbox.crs)

    # Reproject into original high resolution grid
    if resample:
        print("Reprojecting tides into original resolution")
        tides_highres = _pixel_tides_resample(
            tides_lowres,
            gbox,
            resample_method,
            dask_chunks,
            dask_compute,
        )
        return tides_highres

    print("Returning low resolution tide array")
    return tides_lowres

tag_tides

tag_tides(
    data,
    time=None,
    model="EOT20",
    directory=None,
    tidepost_lat=None,
    tidepost_lon=None,
    **model_tides_kwargs
)

Model tide heights for every timestep in a multi-dimensional dataset, and return a new tide_height array that can be used to "tag" each observation with tide heights.

The function models tides at the centroid of the dataset by default, but a custom tidal modelling location can be specified using tidepost_lat and tidepost_lon.

This function uses the parallelised model_tides function under the hood. It supports all tidal models supported by pyTMD, including:

  • Empirical Ocean Tide model (EOT20)
  • Finite Element Solution tide models (FES2022, FES2014, FES2012)
  • TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
  • Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
  • Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)

Parameters:

Name Type Description Default

data

Dataset or DataArray or GeoBox

A multi-dimensional dataset or GeoBox pixel grid that will be used to define the tide modelling location. If data is an xarray object, it should include a "time" dimension. If no "time" dimension exists or if data is a GeoBox, then times must be passed using the time parameter.

required

time

DatetimeLike

By default, tides will be modelled using times from the "time" dimension of data. Alternatively, this param can be used to provide a custom set of times. Accepts any format that can be converted by pandas.to_datetime(). For example: time=pd.date_range(start="2000", end="2001", freq="5h")

None

model

str or list of str

The tide model (or list of models) to use to model tides. If a list is provided, a new "tide_model" dimension will be added to the xarray.DataArray outputs. Defaults to "EOT20"; specify "all" to use all models available in directory. For a full list of available and supported models, run eo_tides.model.list_models.

'EOT20'

directory

str

The directory containing tide model data files. If no path is provided, this will default to the environment variable EO_TIDES_TIDE_MODELS if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by pyTMD (https://geoscienceaustralia.github.io/eo-tides/setup/).

None

tidepost_lat

float

Optional coordinates used to model tides. The default is None, which uses the centroid of the dataset as the tide modelling location.

None

tidepost_lon

float

Optional coordinates used to model tides. The default is None, which uses the centroid of the dataset as the tide modelling location.

None

**model_tides_kwargs

Optional parameters passed to the eo_tides.model.model_tides function. Important parameters include cutoff (used to extrapolate modelled tides away from the coast; defaults to np.inf), crop (whether to crop tide model constituent files on-the-fly to improve performance) etc.

{}

Returns:

Name Type Description
tides_da DataArray

A one-dimensional tide height array. This will contain either tide heights for every timestep in data, or for every time in times if provided.

Source code in eo_tides/eo.py
def tag_tides(
    data: xr.Dataset | xr.DataArray | GeoBox,
    time: DatetimeLike | None = None,
    model: str | list[str] = "EOT20",
    directory: str | os.PathLike | None = None,
    tidepost_lat: float | None = None,
    tidepost_lon: float | None = None,
    **model_tides_kwargs,
) -> xr.DataArray:
    """
    Model tide heights for every timestep in a multi-dimensional
    dataset, and return a new `tide_height` array that can
    be used to "tag" each observation with tide heights.

    The function models tides at the centroid of the dataset
    by default, but a custom tidal modelling location can
    be specified using `tidepost_lat` and `tidepost_lon`.

    This function uses the parallelised `model_tides` function
    under the hood. It supports all tidal models supported by
    `pyTMD`, including:

    - Empirical Ocean Tide model (EOT20)
    - Finite Element Solution tide models (FES2022, FES2014, FES2012)
    - TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
    - Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
    - Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)

    Parameters
    ----------
    data : xarray.Dataset or xarray.DataArray or odc.geo.geobox.GeoBox
        A multi-dimensional dataset or GeoBox pixel grid that will
        be used to define the tide modelling location. If `data`
        is an xarray object, it should include a "time" dimension.
        If no "time" dimension exists or if `data` is a GeoBox,
        then times must be passed using the `time` parameter.
    time : DatetimeLike, optional
        By default, tides will be modelled using times from the
        "time" dimension of `data`. Alternatively, this param can
        be used to provide a custom set of times. Accepts any format
        that can be converted by `pandas.to_datetime()`. For example:
        `time=pd.date_range(start="2000", end="2001", freq="5h")`
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        If a list is provided, a new "tide_model" dimension will be
        added to the `xarray.DataArray` outputs. Defaults to "EOT20";
        specify "all" to use all models available in `directory`.
        For a full list of available and supported models, run
        `eo_tides.model.list_models`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    tidepost_lat, tidepost_lon : float, optional
        Optional coordinates used to model tides. The default is None,
        which uses the centroid of the dataset as the tide modelling
        location.
    **model_tides_kwargs :
        Optional parameters passed to the `eo_tides.model.model_tides`
        function. Important parameters include `cutoff` (used to
        extrapolate modelled tides away from the coast; defaults to
        `np.inf`), `crop` (whether to crop tide model constituent files
        on-the-fly to improve performance) etc.

    Returns
    -------
    tides_da : xr.DataArray
        A one-dimensional tide height array. This will contain either
        tide heights for every timestep in `data`, or for every time in
        `times` if provided.
    """
    # Standardise data inputs, time and models
    gbox, time_coords = _standardise_inputs(data, time)
    model = [model] if isinstance(model, str) else model

    # If custom tide posts are not provided, use dataset centroid
    if tidepost_lat is None or tidepost_lon is None:
        lon, lat = gbox.geographic_extent.centroid.coords[0]
        print(f"Setting tide modelling location from dataset centroid: {lon:.2f}, {lat:.2f}")
    else:
        lon, lat = tidepost_lon, tidepost_lat
        print(f"Using tide modelling location: {lon:.2f}, {lat:.2f}")

    # Model tide heights for each observation:
    tide_df = model_tides(
        x=lon,  # type: ignore
        y=lat,  # type: ignore
        time=time_coords,
        model=model,
        directory=directory,
        crs="EPSG:4326",
        **model_tides_kwargs,
    )

    # If tides cannot be successfully modeled (e.g. if the centre of the
    # xarray dataset is located is over land), raise an exception
    if tide_df.tide_height.isnull().all():
        raise ValueError(
            f"Tides could not be modelled for dataset centroid located "
            f"at {tidepost_lon:.2f}, {tidepost_lat:.2f}. This can occur if "
            f"this coordinate occurs over land. Please manually specify "
            f"a tide modelling location located over water using the "
            f"`tidepost_lat` and `tidepost_lon` parameters."
        )

    # Convert to xarray format
    tides_da = tide_df.reset_index().set_index(["time", "tide_model"]).drop(["x", "y"], axis=1).tide_height.to_xarray()

    # If only one tidal model exists, squeeze out "tide_model" dim
    if len(tides_da.tide_model) == 1:
        tides_da = tides_da.squeeze("tide_model")

    return tides_da

eo_tides.stats

Functions:

Name Description
pixel_stats

Takes a multi-dimensional dataset and generate spatial

tide_stats

Takes a multi-dimensional dataset and generate tide statistics

pixel_stats

pixel_stats(
    data,
    time=None,
    model="EOT20",
    directory=None,
    resample=True,
    modelled_freq="3h",
    min_max_q=(0.0, 1.0),
    resample_method="bilinear",
    dask_chunks=None,
    dask_compute=True,
    extrapolate=True,
    cutoff=10,
    **pixel_tides_kwargs
)

Takes a multi-dimensional dataset and generate spatial tide statistics and satellite-observed tide bias metrics, calculated based on every timestep in the satellite data and modelled into the spatial extent of the imagery.

By comparing the subset of tides observed by satellites against the full astronomical tidal range, we can evaluate whether the tides observed by satellites are biased (e.g. fail to observe either the highest or lowest tides) due to tide aliasing interactions with sun-synchronous satellite overpasses.

Compared to tide_stats, this function models tide metrics spatially to produce a two-dimensional output for each statistic.

For more information about the tidal statistics computed by this function, refer to Figure 8 in Bishop-Taylor et al. 2018: https://www.sciencedirect.com/science/article/pii/S0272771418308783#fig8

Parameters:

Name Type Description Default

data

Dataset or DataArray or GeoBox

A multi-dimensional dataset or GeoBox pixel grid that will be used to calculate spatial tide statistics. If data is an xarray object, it should include a "time" dimension. If no "time" dimension exists or if data is a GeoBox, then times must be passed using the time parameter.

required

time

DatetimeLike

By default, tides will be modelled using times from the "time" dimension of data. Alternatively, this param can be used to provide a custom set of times. Accepts any format that can be converted by pandas.to_datetime(). For example: time=pd.date_range(start="2000", end="2001", freq="5h")

None

model

str or list of str

The tide model (or list of models) to use to model tides. If a list is provided, a new "tide_model" dimension will be added to the xarray.Dataset output. Defaults to "EOT20"; specify "all" to use all models available in directory. For a full list of available and supported models, run eo_tides.model.list_models.

'EOT20'

directory

str

The directory containing tide model data files. If no path is provided, this will default to the environment variable EO_TIDES_TIDE_MODELS if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by pyTMD (https://geoscienceaustralia.github.io/eo-tides/setup/).

None

resample

bool

Whether to resample tide statistics back into data's original higher resolution grid. Set this to False if you want to return lower-resolution tide statistics (which can be useful for assessing tide biases across large spatial extents).

True

modelled_freq

str

An optional string giving the frequency at which to model tides when computing the full modelled tidal range. Defaults to '3h', which computes a tide height for every three hours across the temporal extent of data.

'3h'

min_max_q

tuple

Quantiles used to calculate max and min observed and modelled astronomical tides. By default (0.0, 1.0) which is equivalent to minimum and maximum; for a softer threshold that is more robust to outliers use e.g. (0.1, 0.9).

(0.0, 1.0)

resample_method

str

If resampling is requested (see resample above), use this resampling method when resampling from low resolution to high resolution pixels. Defaults to "bilinear"; valid options include "nearest", "cubic", "min", "max", "average" etc.

'bilinear'

dask_chunks

tuple of float

Can be used to configure custom Dask chunking for the final resampling step. By default, chunks will be automatically set to match y/x chunks from data if they exist; otherwise chunks will be chosen to cover the entire y/x extent of the dataset. For custom chunks, provide a tuple in the form (y, x), e.g. (2048, 2048).

None

dask_compute

bool

Whether to compute results of the resampling step using Dask. If False, stats_ds will be returned as a Dask-enabled array.

True

extrapolate

bool

Whether to extrapolate tides into x and y coordinates outside of the valid tide modelling domain using nearest-neighbor. Defaults to True.

True

cutoff

float

Extrapolation cutoff in kilometers. To avoid producing tide statistics too far inland, the default is 10 km.

10

**pixel_tides_kwargs

Optional parameters passed to the eo_tides.eo.pixel_tides function.

{}

Returns:

Name Type Description
stats_ds Dataset

An xarray.Dataset containing the following statistics as two-dimensional data variables:

  • mot: mean tide height observed by the satellite (metres)
  • mat: mean modelled astronomical tide height (metres)
  • lot: minimum tide height observed by the satellite (metres)
  • lat: minimum tide height from modelled astronomical tidal range (metres)
  • hot: maximum tide height observed by the satellite (metres)
  • hat: maximum tide height from modelled astronomical tidal range (metres)
  • otr: tidal range observed by the satellite (metres)
  • tr: modelled astronomical tide range (metres)
  • spread: proportion of the full modelled tidal range observed by the satellite
  • offset_low: proportion of the lowest tides never observed by the satellite
  • offset_high: proportion of the highest tides never observed by the satellite
Source code in eo_tides/stats.py
def pixel_stats(
    data: xr.Dataset | xr.DataArray | GeoBox,
    time: DatetimeLike | None = None,
    model: str | list[str] = "EOT20",
    directory: str | os.PathLike | None = None,
    resample: bool = True,
    modelled_freq: str = "3h",
    min_max_q: tuple[float, float] = (0.0, 1.0),
    resample_method: str = "bilinear",
    dask_chunks: tuple[float, float] | None = None,
    dask_compute: bool = True,
    extrapolate: bool = True,
    cutoff: float = 10,
    **pixel_tides_kwargs,
) -> xr.Dataset:
    """
    Takes a multi-dimensional dataset and generate spatial
    tide statistics and satellite-observed tide bias metrics,
    calculated based on every timestep in the satellite data and
    modelled into the spatial extent of the imagery.

    By comparing the subset of tides observed by satellites
    against the full astronomical tidal range, we can evaluate
    whether the tides observed by satellites are biased
    (e.g. fail to observe either the highest or lowest tides)
    due to tide aliasing interactions with sun-synchronous satellite
    overpasses.

    Compared to `tide_stats`, this function models tide metrics
    spatially to produce a two-dimensional output for each statistic.

    For more information about the tidal statistics computed by this
    function, refer to Figure 8 in Bishop-Taylor et al. 2018:
    <https://www.sciencedirect.com/science/article/pii/S0272771418308783#fig8>

    Parameters
    ----------
    data : xarray.Dataset or xarray.DataArray or odc.geo.geobox.GeoBox
        A multi-dimensional dataset or GeoBox pixel grid that will
        be used to calculate spatial tide statistics. If `data`
        is an xarray object, it should include a "time" dimension.
        If no "time" dimension exists or if `data` is a GeoBox,
        then times must be passed using the `time` parameter.
    time : DatetimeLike, optional
        By default, tides will be modelled using times from the
        "time" dimension of `data`. Alternatively, this param can
        be used to provide a custom set of times. Accepts any format
        that can be converted by `pandas.to_datetime()`. For example:
        `time=pd.date_range(start="2000", end="2001", freq="5h")`
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        If a list is provided, a new "tide_model" dimension will be
        added to the `xarray.Dataset` output. Defaults to "EOT20";
        specify "all" to use all models available in `directory`.
        For a full list of available and supported models, run
        `eo_tides.model.list_models`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    resample : bool, optional
        Whether to resample tide statistics back into `data`'s original
        higher resolution grid. Set this to `False` if you want to return
        lower-resolution tide statistics (which can be useful for
        assessing tide biases across large spatial extents).
    modelled_freq : str, optional
        An optional string giving the frequency at which to model tides
        when computing the full modelled tidal range. Defaults to '3h',
        which computes a tide height for every three hours across the
        temporal extent of `data`.
    min_max_q : tuple, optional
        Quantiles used to calculate max and min observed and modelled
        astronomical tides. By default `(0.0, 1.0)` which is equivalent
        to minimum and maximum; for a softer threshold that is more
        robust to outliers use e.g. `(0.1, 0.9)`.
    resample_method : str, optional
        If resampling is requested (see `resample` above), use this
        resampling method when resampling from low resolution to high
        resolution pixels. Defaults to "bilinear"; valid options include
        "nearest", "cubic", "min", "max", "average" etc.
    dask_chunks : tuple of float, optional
        Can be used to configure custom Dask chunking for the final
        resampling step. By default, chunks will be automatically set
        to match y/x chunks from `data` if they exist; otherwise chunks
        will be chosen to cover the entire y/x extent of the dataset.
        For custom chunks, provide a tuple in the form `(y, x)`, e.g.
        `(2048, 2048)`.
    dask_compute : bool, optional
        Whether to compute results of the resampling step using Dask.
        If False, `stats_ds` will be returned as a Dask-enabled array.
    extrapolate : bool, optional
        Whether to extrapolate tides into x and y coordinates outside of
        the valid tide modelling domain using nearest-neighbor. Defaults
        to True.
    cutoff : float, optional
        Extrapolation cutoff in kilometers. To avoid producing tide
        statistics too far inland, the default is 10 km.
    **pixel_tides_kwargs :
        Optional parameters passed to the `eo_tides.eo.pixel_tides`
        function.

    Returns
    -------
    stats_ds : xarray.Dataset
        An `xarray.Dataset` containing the following statistics as two-dimensional data variables:

        - `mot`: mean tide height observed by the satellite (metres)
        - `mat`: mean modelled astronomical tide height (metres)
        - `lot`: minimum tide height observed by the satellite (metres)
        - `lat`: minimum tide height from modelled astronomical tidal range (metres)
        - `hot`: maximum tide height observed by the satellite (metres)
        - `hat`: maximum tide height from modelled astronomical tidal range (metres)
        - `otr`: tidal range observed by the satellite (metres)
        - `tr`: modelled astronomical tide range (metres)
        - `spread`: proportion of the full modelled tidal range observed by the satellite
        - `offset_low`: proportion of the lowest tides never observed by the satellite
        - `offset_high`: proportion of the highest tides never observed by the satellite

    """

    # Standardise data inputs, time and models
    gbox, obs_times = _standardise_inputs(data, time)
    dask_chunks = _resample_chunks(data, dask_chunks)
    model = [model] if isinstance(model, str) else model

    # Generate range of times covering entire period of satellite record
    assert obs_times is not None
    all_times = pd.date_range(
        start=obs_times.min().item(),
        end=obs_times.max().item(),
        freq=modelled_freq,
    )

    # Model tides for observed timesteps
    obs_tides_da = pixel_tides(
        gbox,
        time=obs_times,
        model=model,
        directory=directory,
        resample=False,
        extrapolate=extrapolate,
        cutoff=cutoff,
        **pixel_tides_kwargs,
    )

    # Model tides for all modelled timesteps
    all_tides_da = pixel_tides(
        gbox,
        time=all_times,
        model=model,
        directory=directory,
        resample=False,
        extrapolate=extrapolate,
        cutoff=cutoff,
        **pixel_tides_kwargs,
    )

    # Calculate statistics
    stats_lowres = _tide_statistics(obs_tides_da, all_tides_da, min_max_q=min_max_q)

    # Assign CRS and geobox to allow reprojection
    stats_lowres = stats_lowres.odc.assign_crs(crs=gbox.crs)

    # Reproject statistics into original high resolution grid
    if resample:
        print("Reprojecting statistics into original resolution")
        stats_highres = _pixel_tides_resample(
            stats_lowres,
            gbox,
            resample_method,
            dask_chunks,
            dask_compute,
            None,
        )
        return stats_highres

    print("Returning low resolution statistics array")
    return stats_lowres

tide_stats

tide_stats(
    data,
    time=None,
    model="EOT20",
    directory=None,
    tidepost_lat=None,
    tidepost_lon=None,
    plain_english=True,
    plot=True,
    plot_var=None,
    point_col=None,
    modelled_freq="3h",
    min_max_q=(0.0, 1.0),
    round_stats=3,
    **tag_tides_kwargs
)

Takes a multi-dimensional dataset and generate tide statistics and satellite-observed tide bias metrics, calculated based on every timestep in the satellite data and the geographic centroid of the imagery.

By comparing the subset of tides observed by satellites against the full astronomical tidal range, we can evaluate whether the tides observed by satellites are biased (e.g. fail to observe either the highest or lowest tides) due to tide aliasing interactions with sun-synchronous satellite overpasses.

For more information about the tidal statistics computed by this function, refer to Figure 8 in Bishop-Taylor et al. 2018: https://www.sciencedirect.com/science/article/pii/S0272771418308783#fig8

Parameters:

Name Type Description Default

data

Dataset or DataArray or GeoBox

A multi-dimensional dataset or GeoBox pixel grid that will be used to calculate tide statistics. If data is an xarray object, it should include a "time" dimension. If no "time" dimension exists or if data is a GeoBox, then times must be passed using the time parameter.

required

time

DatetimeLike

By default, tides will be modelled using times from the "time" dimension of data. Alternatively, this param can be used to provide a custom set of times. Accepts any format that can be converted by pandas.to_datetime(). For example: time=pd.date_range(start="2000", end="2001", freq="5h")

None

model

str or list of str

The tide model (or list of models) to use to model tides. If a list is provided, the resulting statistics will be returned as a pandas.Dataframe; otherwise a pandas.Series. Defaults to "EOT20"; specify "all" to use all models available in directory. For a full list of available and supported models, run eo_tides.model.list_models.

'EOT20'

directory

str

The directory containing tide model data files. If no path is provided, this will default to the environment variable EO_TIDES_TIDE_MODELS if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by pyTMD (https://geoscienceaustralia.github.io/eo-tides/setup/).

None

tidepost_lat

float or int

Optional coordinates used to model tides. The default is None, which uses the centroid of the dataset as the tide modelling location.

None

tidepost_lon

float or int

Optional coordinates used to model tides. The default is None, which uses the centroid of the dataset as the tide modelling location.

None

plain_english

bool

An optional boolean indicating whether to print a plain english version of the tidal statistics to the screen. Defaults to True; only supported when a single tide model is passed to model.

True

plot

bool

An optional boolean indicating whether to plot how satellite- observed tide heights compare against the full tidal range. Defaults to True; only supported when a single tide model is passed to model.

True

plot_var

str

Optional name of a coordinate, dimension or variable in the array that will be used to plot observations with unique symbols. Defaults to None, which will plot all observations as circles.

None

point_col

str

Colour used to plot points on the graph. Defaults to None which will automatically select colours.

None

modelled_freq

str

An optional string giving the frequency at which to model tides when computing the full modelled tidal range. Defaults to '3h', which computes a tide height for every three hours across the temporal extent of data.

'3h'

min_max_q

tuple

Quantiles used to calculate max and min observed and modelled astronomical tides. By default (0.0, 1.0) which is equivalent to minimum and maximum; to use a softer threshold that is more robust to outliers, use e.g. (0.1, 0.9).

(0.0, 1.0)

round_stats

int

The number of decimal places used to round the output statistics. Defaults to 3.

3

**tag_tides_kwargs

Optional parameters passed to the eo_tides.eo.tag_tides function that is used to model tides for each observed and modelled timestep.

{}

Returns:

Name Type Description
stats_df Series or Dataframe

A pandas object containing the following statistics:

  • y: latitude used for modelling tide heights
  • x: longitude used for modelling tide heights
  • mot: mean tide height observed by the satellite (metres)
  • mat: mean modelled astronomical tide height (metres)
  • lot: minimum tide height observed by the satellite (metres)
  • lat: minimum tide height from modelled astronomical tidal range (metres)
  • hot: maximum tide height observed by the satellite (metres)
  • hat: maximum tide height from modelled astronomical tidal range (metres)
  • otr: tidal range observed by the satellite (metres)
  • tr: modelled astronomical tide range (metres)
  • spread: proportion of the full modelled tidal range observed by the satellite
  • offset_low: proportion of the lowest tides never observed by the satellite
  • offset_high: proportion of the highest tides never observed by the satellite
Source code in eo_tides/stats.py
def tide_stats(
    data: xr.Dataset | xr.DataArray | GeoBox,
    time: DatetimeLike | None = None,
    model: str = "EOT20",
    directory: str | os.PathLike | None = None,
    tidepost_lat: float | None = None,
    tidepost_lon: float | None = None,
    plain_english: bool = True,
    plot: bool = True,
    plot_var: str | None = None,
    point_col: str | None = None,
    modelled_freq: str = "3h",
    min_max_q: tuple = (0.0, 1.0),
    round_stats: int = 3,
    **tag_tides_kwargs,
) -> pd.Series:
    """
    Takes a multi-dimensional dataset and generate tide statistics
    and satellite-observed tide bias metrics, calculated based on
    every timestep in the satellite data and the geographic centroid
    of the imagery.

    By comparing the subset of tides observed by satellites
    against the full astronomical tidal range, we can evaluate
    whether the tides observed by satellites are biased
    (e.g. fail to observe either the highest or lowest tides) due
    to tide aliasing interactions with sun-synchronous satellite
    overpasses.

    For more information about the tidal statistics computed by this
    function, refer to Figure 8 in Bishop-Taylor et al. 2018:
    <https://www.sciencedirect.com/science/article/pii/S0272771418308783#fig8>

    Parameters
    ----------
    data : xarray.Dataset or xarray.DataArray or odc.geo.geobox.GeoBox
        A multi-dimensional dataset or GeoBox pixel grid that will
        be used to calculate tide statistics. If `data` is an
        xarray object, it should include a "time" dimension.
        If no "time" dimension exists or if `data` is a GeoBox,
        then times must be passed using the `time` parameter.
    time : DatetimeLike, optional
        By default, tides will be modelled using times from the
        "time" dimension of `data`. Alternatively, this param can
        be used to provide a custom set of times. Accepts any format
        that can be converted by `pandas.to_datetime()`. For example:
        `time=pd.date_range(start="2000", end="2001", freq="5h")`
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        If a list is provided, the resulting statistics will be
        returned as a `pandas.Dataframe`; otherwise a `pandas.Series`.
        Defaults to "EOT20"; specify "all" to use all models available
        in `directory`. For a full list of available and supported
        models, run `eo_tides.model.list_models`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    tidepost_lat, tidepost_lon : float or int, optional
        Optional coordinates used to model tides. The default is None,
        which uses the centroid of the dataset as the tide modelling
        location.
    plain_english : bool, optional
        An optional boolean indicating whether to print a plain english
        version of the tidal statistics to the screen. Defaults to True;
        only supported when a single tide model is passed to `model`.
    plot : bool, optional
        An optional boolean indicating whether to plot how satellite-
        observed tide heights compare against the full tidal range.
        Defaults to True; only supported when a single tide model is
        passed to `model`.
    plot_var : str, optional
        Optional name of a coordinate, dimension or variable in the array
        that will be used to plot observations with unique symbols.
        Defaults to None, which will plot all observations as circles.
    point_col : str, optional
        Colour used to plot points on the graph. Defaults to None which
        will automatically select colours.
    modelled_freq : str, optional
        An optional string giving the frequency at which to model tides
        when computing the full modelled tidal range. Defaults to '3h',
        which computes a tide height for every three hours across the
        temporal extent of `data`.
    min_max_q : tuple, optional
        Quantiles used to calculate max and min observed and modelled
        astronomical tides. By default `(0.0, 1.0)` which is equivalent
        to minimum and maximum; to use a softer threshold that is more
        robust to outliers, use e.g. `(0.1, 0.9)`.
    round_stats : int, optional
        The number of decimal places used to round the output statistics.
        Defaults to 3.
    **tag_tides_kwargs :
        Optional parameters passed to the `eo_tides.eo.tag_tides`
        function that is used to model tides for each observed and
        modelled timestep.

    Returns
    -------
    stats_df : pandas.Series or pandas.Dataframe
        A pandas object containing the following statistics:

        - `y`: latitude used for modelling tide heights
        - `x`: longitude used for modelling tide heights
        - `mot`: mean tide height observed by the satellite (metres)
        - `mat`: mean modelled astronomical tide height (metres)
        - `lot`: minimum tide height observed by the satellite (metres)
        - `lat`: minimum tide height from modelled astronomical tidal range (metres)
        - `hot`: maximum tide height observed by the satellite (metres)
        - `hat`: maximum tide height from modelled astronomical tidal range (metres)
        - `otr`: tidal range observed by the satellite (metres)
        - `tr`: modelled astronomical tide range (metres)
        - `spread`: proportion of the full modelled tidal range observed by the satellite
        - `offset_low`: proportion of the lowest tides never observed by the satellite
        - `offset_high`: proportion of the highest tides never observed by the satellite
    """

    # Standardise data inputs, time and models
    gbox, obs_times = _standardise_inputs(data, time)

    # Generate range of times covering entire period of satellite record
    assert obs_times is not None
    all_times = pd.date_range(
        start=obs_times.min().item(),
        end=obs_times.max().item(),
        freq=modelled_freq,
    )

    # If custom tide modelling locations are not provided, use the
    # dataset centroid
    if not tidepost_lat or not tidepost_lon:
        tidepost_lon, tidepost_lat = gbox.geographic_extent.centroid.coords[0]

    # Model tides for observed timesteps
    obs_tides_da = tag_tides(
        gbox,
        time=obs_times,
        model=model,
        directory=directory,
        tidepost_lat=tidepost_lat,  # type: ignore
        tidepost_lon=tidepost_lon,  # type: ignore
        **tag_tides_kwargs,
    )

    # Model tides for all modelled timesteps
    all_tides_da = tag_tides(
        gbox,
        time=all_times,
        model=model,
        directory=directory,
        tidepost_lat=tidepost_lat,  # type: ignore
        tidepost_lon=tidepost_lon,  # type: ignore
        **tag_tides_kwargs,
    )

    # Calculate statistics
    stats_ds = _tide_statistics(obs_tides_da, all_tides_da, min_max_q=min_max_q)

    # Convert to pandas and add tide post coordinates
    stats_df = stats_ds.to_pandas().astype("float32")
    stats_df["x"] = tidepost_lon
    stats_df["y"] = tidepost_lat

    # Convert coordinates to index if dataframe
    if isinstance(stats_df, pd.DataFrame):
        stats_df = stats_df.set_index(["x", "y"], append=True)

    # If a series, print and plot summaries
    else:
        if plain_english:
            _stats_plain_english(
                mot=stats_df.mot,
                mat=stats_df.mat,
                hot=stats_df.hot,
                hat=stats_df.hat,
                lot=stats_df.lot,
                lat=stats_df.lat,
                otr=stats_df.otr,
                tr=stats_df.tr,
                spread=stats_df.spread,
                offset_low=stats_df.offset_low,
                offset_high=stats_df.offset_high,
            )

        if plot:
            _stats_figure(
                all_tides_da=all_tides_da,
                obs_tides_da=obs_tides_da,
                hot=stats_df.hot,
                hat=stats_df.hat,
                lot=stats_df.lot,
                lat=stats_df.lat,
                spread=stats_df.spread,
                offset_low=stats_df.offset_low,
                offset_high=stats_df.offset_high,
                plot_var=data[plot_var] if plot_var else None,
                point_col=point_col,
            )

    # Return in Pandas format
    return stats_df.round(round_stats)

eo_tides.validation

Functions:

Name Description
eval_metrics

Calculate a set of common statistical metrics

load_gauge_gesla

Load Global Extreme Sea Level Analysis (GESLA) tide gauge data.

eval_metrics

eval_metrics(x, y, round=3, all_regress=False)

Calculate a set of common statistical metrics based on two input actual and predicted vectors.

These include:

  • Pearson correlation
  • Root Mean Squared Error
  • Mean Absolute Error
  • R-squared
  • Bias
  • Linear regression parameters (slope, p-value, intercept, standard error)

Parameters:

Name Type Description Default

x

array

An array providing "actual" variable values.

required

y

array

An array providing "predicted" variable values.

required

round

int

Number of decimal places to round each metric to. Defaults to 3.

3

all_regress

bool

Whether to return linear regression p-value, intercept and standard error (in addition to only regression slope). Defaults to False.

False

Returns:

Type Description
Series

A pd.Series containing all calculated metrics.

Source code in eo_tides/validation.py
def eval_metrics(x, y, round=3, all_regress=False):
    """
    Calculate a set of common statistical metrics
    based on two input actual and predicted vectors.

    These include:

    * Pearson correlation
    * Root Mean Squared Error
    * Mean Absolute Error
    * R-squared
    * Bias
    * Linear regression parameters (slope, p-value, intercept, standard error)

    Parameters
    ----------
    x : numpy.array
        An array providing "actual" variable values.
    y : numpy.array
        An array providing "predicted" variable values.
    round : int
        Number of decimal places to round each metric
        to. Defaults to 3.
    all_regress : bool
        Whether to return linear regression p-value,
        intercept and standard error (in addition to
        only regression slope). Defaults to False.

    Returns
    -------
    pandas.Series
        A `pd.Series` containing all calculated metrics.
    """

    # Create dataframe to drop na
    xy_df = pd.DataFrame({"x": x, "y": y}).dropna()

    # Compute linear regression
    lin_reg = stats.linregress(x=xy_df.x, y=xy_df.y)

    # Calculate statistics
    stats_dict = {
        "Correlation": xy_df.corr().iloc[0, 1],
        "RMSE": sqrt(mean_squared_error(xy_df.x, xy_df.y)),
        "MAE": mean_absolute_error(xy_df.x, xy_df.y),
        "R-squared": lin_reg.rvalue**2,
        "Bias": (xy_df.y - xy_df.x).mean(),
        "Regression slope": lin_reg.slope,
    }

    # Additional regression params
    if all_regress:
        stats_dict.update({
            "Regression p-value": lin_reg.pvalue,
            "Regression intercept": lin_reg.intercept,
            "Regression standard error": lin_reg.stderr,
        })

    # Return as
    return pd.Series(stats_dict).round(round)

load_gauge_gesla

load_gauge_gesla(
    x=None,
    y=None,
    site_code=None,
    time=("2018", "2020"),
    max_distance=None,
    correct_mean=False,
    filter_use_flag=True,
    site_metadata=True,
    data_path="/gdata1/data/sea_level/gesla/",
    metadata_path="/gdata1/data/sea_level/GESLA3_ALL 2.csv",
)

Load Global Extreme Sea Level Analysis (GESLA) tide gauge data.

Load and process all available GESLA measured sea-level data with an x, y, time spatio-temporal query, or from a list of specific tide gauges. Can optionally filter by gauge quality and append detailed gauge metadata.

Modified from original code in https://github.com/philiprt/GeslaDataset.

Parameters:

Name Type Description Default

x

numeric or list / tuple

Coordinates (in degrees longitude, latitude) used to load GESLA tide gauge observations. If provided as singular values (e.g. x=150, y=-32), then the nearest tide gauge will be returned. If provided as a list or tuple (e.g. x=(150, 152), y=(-32, -30)), then all gauges within the provided bounding box will be loaded. Leave as None to return all available gauges, or if providing a list of site codes using site_code.

None

y

numeric or list / tuple

Coordinates (in degrees longitude, latitude) used to load GESLA tide gauge observations. If provided as singular values (e.g. x=150, y=-32), then the nearest tide gauge will be returned. If provided as a list or tuple (e.g. x=(150, 152), y=(-32, -30)), then all gauges within the provided bounding box will be loaded. Leave as None to return all available gauges, or if providing a list of site codes using site_code.

None

site_code

str or list of str

GESLA site code(s) for which to load data (e.g. site_code="62650"). If site_code is provided, x and y will be ignored.

None

time

tuple or list of str

Time range to consider, given as a tuple of start and end dates, e.g. time=("2020", "2021"). The default of None will return all tide observations from the year 1800 onward.

('2018', '2020')

max_distance

numeric

Optional max distance within which to return the nearest tide gauge when x and y are provided as singular coordinates. Defaults to None, which will always return a tide gauge no matter how far away it is located from x and y.

None

correct_mean

bool

Whether to correct sea level measurements to a standardised mean sea level by subtracting the mean of all observed sea level observations. This can be useful when GESLA tide heights come from different or unknown tide datums. Note: the observed mean sea level calculated here may differ from true long-term/ astronomical Mean Sea Level (MSL) datum.

False

filter_use_flag

bool

Whether to filter out low quality observations with a "use_flag" value of 0 (do not use). Defaults to True.

True

site_metadata

bool

Whether to add tide gauge station metadata as additional columns in the output DataFrame. Defaults to True.

True

data_path

str

Path to the raw GESLA data files. Default is /gdata1/data/sea_level/gesla/.

'/gdata1/data/sea_level/gesla/'

metadata_path

str

Path to the GESLA station metadata file. Default is /gdata1/data/sea_level/GESLA3_ALL 2.csv.

'/gdata1/data/sea_level/GESLA3_ALL 2.csv'

Returns:

Type Description
DataFrame

Processed GESLA data as a DataFrame with columns including:

  • "time": Timestamps,
  • "sea_level": Observed sea level (m),
  • "qc_flag": Observed sea level QC flag,
  • "use_flag": Use-in-analysis flag (1 = use, 0 = do not use),

...and additional columns from station metadata.

Source code in eo_tides/validation.py
def load_gauge_gesla(
    x=None,
    y=None,
    site_code=None,
    time=("2018", "2020"),
    max_distance=None,
    correct_mean=False,
    filter_use_flag=True,
    site_metadata=True,
    data_path="/gdata1/data/sea_level/gesla/",
    metadata_path="/gdata1/data/sea_level/GESLA3_ALL 2.csv",
):
    """
    Load Global Extreme Sea Level Analysis (GESLA) tide gauge data.

    Load and process all available GESLA measured sea-level data
    with an `x, y, time` spatio-temporal query, or from a list of
    specific tide gauges. Can optionally filter by gauge quality
    and append detailed gauge metadata.

    Modified from original code in <https://github.com/philiprt/GeslaDataset>.

    Parameters
    ----------
    x, y : numeric or list/tuple, optional
        Coordinates (in degrees longitude, latitude) used to load GESLA
        tide gauge observations. If provided as singular values
        (e.g. `x=150, y=-32`), then the nearest tide gauge will be returned.
        If provided as a list or tuple (e.g. `x=(150, 152), y=(-32, -30)`),
        then all gauges within the provided bounding box will be loaded.
        Leave as `None` to return all available gauges, or if providing a
        list of site codes using `site_code`.
    site_code : str or list of str, optional
        GESLA site code(s) for which to load data (e.g. `site_code="62650"`).
        If `site_code` is provided, `x` and `y` will be ignored.
    time : tuple or list of str, optional
        Time range to consider, given as a tuple of start and end dates,
        e.g. `time=("2020", "2021")`. The default of None will return all
        tide observations from the year 1800 onward.
    max_distance : numeric, optional
        Optional max distance within which to return the nearest tide gauge
        when `x` and `y` are provided as singular coordinates. Defaults to
        None, which will always return a tide gauge no matter how far away
        it is located from `x` and `y`.
    correct_mean : bool, optional
        Whether to correct sea level measurements to a standardised mean
        sea level by subtracting the mean of all observed sea level
        observations. This can be useful when GESLA tide heights come
        from different or unknown tide datums. Note: the observed mean
        sea level calculated here may differ from true long-term/
        astronomical Mean Sea Level (MSL) datum.
    filter_use_flag : bool, optional
        Whether to filter out low quality observations with a "use_flag"
        value of 0 (do not use). Defaults to True.
    site_metadata : bool, optional
        Whether to add tide gauge station metadata as additional columns
        in the output DataFrame. Defaults to True.
    data_path : str, optional
        Path to the raw GESLA data files. Default is
        `/gdata1/data/sea_level/gesla/`.
    metadata_path : str, optional
        Path to the GESLA station metadata file.
        Default is `/gdata1/data/sea_level/GESLA3_ALL 2.csv`.

    Returns
    -------
    pd.DataFrame
        Processed GESLA data as a DataFrame with columns including:

        - "time": Timestamps,
        - "sea_level": Observed sea level (m),
        - "qc_flag": Observed sea level QC flag,
        - "use_flag": Use-in-analysis flag (1 = use, 0 = do not use),

        ...and additional columns from station metadata.
    """
    # Load tide gauge metadata
    metadata_df, metadata_gdf = _load_gauge_metadata(metadata_path)

    # Use supplied site codes if available
    if site_code is not None:
        site_code = [site_code] if not isinstance(site_code, list) else site_code

    # If x and y are tuples, use xy bounds to identify sites
    elif isinstance(x, (tuple, list)) & isinstance(y, (tuple, list)):
        bbox = BoundingBox.from_xy(x, y)
        site_code = metadata_gdf.cx[bbox.left : bbox.right, bbox.top : bbox.bottom].index

    # If x and y are single numbers, select nearest row
    elif isinstance(x, Number) & isinstance(y, Number):
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            site_code = (
                _nearest_row(metadata_gdf, x, y, max_distance).rename({"index_right": "site_code"}, axis=1).site_code
            )
            # site_code = _nearest_row(metadata_gdf, x, y, max_distance).site_code

        # Raise exception if no valid tide gauges are found
        if site_code.isnull().all():
            raise Exception(f"No tide gauge found within {max_distance} degrees of {x}, {y}.")

    # Otherwise if all are None, return all available site codes
    elif (site_code is None) & (x is None) & (y is None):
        site_code = metadata_df.index.to_list()

    else:
        raise TypeError(
            "`x` and `y` must be provided as either singular coordinates (e.g. `x=150`), or as a tuple bounding box (e.g. `x=(150, 152)`)."
        )

    # Prepare times
    if time is None:
        time = ["1800", str(datetime.datetime.now().year)]
    time = [time] if not isinstance(time, (list, tuple)) else time
    start_time = _round_date_strings(time[0], round_type="start")
    end_time = _round_date_strings(time[-1], round_type="end")

    # Identify paths to load and nodata values for each site
    metadata_df["file_name"] = data_path + metadata_df["file_name"]
    paths_na = metadata_df.loc[site_code, ["file_name", "null_value"]]

    # Load and combine into a single dataframe
    gauge_list = [
        _load_gesla_dataset(s, p, na_value=na)
        for s, p, na in tqdm.tqdm(paths_na.itertuples(), total=len(paths_na), desc="Loading GESLA gauges")
    ]
    data_df = pd.concat(gauge_list).sort_index().loc[slice(start_time, end_time)].reset_index().set_index("site_code")

    # Optionally filter by use flag column
    if filter_use_flag:
        data_df = data_df.loc[data_df.use_flag == 1]

    # Optionally insert metadata into dataframe
    if site_metadata:
        data_df[metadata_df.columns] = metadata_df.loc[site_code]

    # Add time to index and remove duplicates
    data_df = data_df.set_index("time", append=True)
    duplicates = data_df.index.duplicated()
    if duplicates.sum() > 0:
        warnings.warn("Duplicate timestamps were removed.")
        data_df = data_df.loc[~duplicates]

    # Remove observed mean sea level if requested
    if correct_mean:
        data_df["sea_level"] = data_df["sea_level"].sub(data_df.groupby("site_code")["sea_level"].transform("mean"))

    # Return data
    return data_df

eo_tides.utils

Functions:

Name Description
clip_models

Clip NetCDF-format ocean tide models to a bounding box.

idw

Perform Inverse Distance Weighting (IDW) interpolation.

list_models

List all tide models available for tide modelling.

clip_models

clip_models(
    input_directory,
    output_directory,
    bbox,
    model=None,
    buffer=5,
    overwrite=False,
)

Clip NetCDF-format ocean tide models to a bounding box.

This function identifies all NetCDF-format tide models in a given input directory, including "ATLAS-netcdf" (e.g. TPXO9-atlas-nc), "FES-netcdf" (e.g. FES2022, EOT20), and "GOT-netcdf" (e.g. GOT5.5) format files. Files for each model are then clipped to the extent of the provided bounding box, handling model-specific file structures. After each model is clipped, the result is exported to the output directory and verified with pyTMD to ensure the clipped data is suitable for tide modelling.

For instructions on accessing and downloading tide models, see: https://geoscienceaustralia.github.io/eo-tides/setup/

Parameters:

Name Type Description Default

input_directory

str or PathLike

Path to directory containing input NetCDF-format tide model files.

required

output_directory

str or PathLike

Path to directory where clipped NetCDF files will be exported.

required

bbox

tuple of float

Bounding box for clipping the tide models in EPSG:4326 degrees coordinates, specified as (left, bottom, right, top).

required

model

str or list of str

The tide model (or models) to clip. Defaults to None, which will automatically identify and clip all NetCDF-format models in the input directly.

None

buffer

float

Buffer distance (in degrees) added to the bounding box to provide sufficient data on edges of study area. Defaults to 5 degrees.

5

overwrite

bool

If True, overwrite existing files in the output directory. Defaults to False.

False

Examples:

>>> clip_models(
...     input_directory="tide_models/",
...     output_directory="tide_models_clipped/",
...     bbox=(-8.968392, 50.070574, 2.447160, 59.367122),
... )
Source code in eo_tides/utils.py
def clip_models(
    input_directory: str | os.PathLike,
    output_directory: str | os.PathLike,
    bbox: tuple[float, float, float, float],
    model: list | None = None,
    buffer: float = 5,
    overwrite: bool = False,
):
    """
    Clip NetCDF-format ocean tide models to a bounding box.

    This function identifies all NetCDF-format tide models in a
    given input directory, including "ATLAS-netcdf" (e.g. TPXO9-atlas-nc),
    "FES-netcdf" (e.g. FES2022, EOT20), and "GOT-netcdf" (e.g. GOT5.5)
    format files. Files for each model are then clipped to the extent of
    the provided bounding box, handling model-specific file structures.
    After each model is clipped, the result is exported to the output
    directory and verified with `pyTMD` to ensure the clipped data is
    suitable for tide modelling.

    For instructions on accessing and downloading tide models, see:
    <https://geoscienceaustralia.github.io/eo-tides/setup/>

    Parameters
    ----------
    input_directory : str or os.PathLike
        Path to directory containing input NetCDF-format tide model files.
    output_directory : str or os.PathLike
        Path to directory where clipped NetCDF files will be exported.
    bbox : tuple of float
        Bounding box for clipping the tide models in EPSG:4326 degrees
        coordinates, specified as `(left, bottom, right, top)`.
    model : str or list of str, optional
        The tide model (or models) to clip. Defaults to None, which
        will automatically identify and clip all NetCDF-format models
        in the input directly.
    buffer : float, optional
        Buffer distance (in degrees) added to the bounding box to provide
        sufficient data on edges of study area. Defaults to 5 degrees.
    overwrite : bool, optional
        If True, overwrite existing files in the output directory.
        Defaults to False.

    Examples
    --------
    >>> clip_models(
    ...     input_directory="tide_models/",
    ...     output_directory="tide_models_clipped/",
    ...     bbox=(-8.968392, 50.070574, 2.447160, 59.367122),
    ... )
    """

    # Get input and output paths
    input_directory = _set_directory(input_directory)
    output_directory = pathlib.Path(output_directory)

    # Prepare bounding box
    bbox = odc.geo.geom.BoundingBox(*bbox, crs="EPSG:4326").buffered(buffer)

    # Identify NetCDF models
    model_database = load_database()["elevation"]
    netcdf_formats = ["ATLAS-netcdf", "FES-netcdf", "GOT-netcdf"]
    netcdf_models = {k for k, v in model_database.items() if v["format"] in netcdf_formats}

    # Identify subset of available and requested NetCDF models
    available_models, _ = list_models(directory=input_directory, show_available=False, show_supported=False)
    requested_models = list(np.atleast_1d(model)) if model is not None else available_models
    available_netcdf_models = list(set(available_models) & set(requested_models) & set(netcdf_models))

    # Raise error if no valid models found
    if len(available_netcdf_models) == 0:
        raise ValueError(f"No valid NetCDF models found in {input_directory}.")

    # If model list is provided,
    print(f"Preparing to clip suitable NetCDF models: {available_netcdf_models}\n")

    # Loop through suitable models and export
    for m in available_netcdf_models:
        # Get model file and grid file list if they exist
        model_files = model_database[m].get("model_file", [])
        grid_file = model_database[m].get("grid_file", [])

        # Convert to list if strings and combine
        model_files = model_files if isinstance(model_files, list) else [model_files]
        grid_file = grid_file if isinstance(grid_file, list) else [grid_file]
        all_files = model_files + grid_file

        # Loop through each model file and clip
        for file in tqdm(all_files, desc=f"Clipping {m}"):
            # Skip if it exists in output directory
            if (output_directory / file).exists() and not overwrite:
                continue

            # Load model file
            nc = xr.open_mfdataset(input_directory / file)

            # Open file and clip according to model
            if m in (
                "GOT5.5",
                "GOT5.5_load",
                "GOT5.5_extrapolated",
                "GOT5.5D",
                "GOT5.5D_extrapolated",
                "GOT5.6",
                "GOT5.6_extrapolated",
            ):
                nc_clipped = _clip_model_file(
                    nc,
                    bbox,
                    xdim="lon",
                    ydim="lat",
                    ycoord="latitude",
                    xcoord="longitude",
                )

            elif m in ("HAMTIDE11",):
                nc_clipped = _clip_model_file(nc, bbox, xdim="LON", ydim="LAT", ycoord="LAT", xcoord="LON")

            elif m in (
                "EOT20",
                "EOT20_load",
                "FES2012",
                "FES2014",
                "FES2014_extrapolated",
                "FES2014_load",
                "FES2022",
                "FES2022_extrapolated",
                "FES2022_load",
            ):
                nc_clipped = _clip_model_file(nc, bbox, xdim="lon", ydim="lat", ycoord="lat", xcoord="lon")

            elif m in (
                "TPXO8-atlas-nc",
                "TPXO9-atlas-nc",
                "TPXO9-atlas-v2-nc",
                "TPXO9-atlas-v3-nc",
                "TPXO9-atlas-v4-nc",
                "TPXO9-atlas-v5-nc",
                "TPXO10-atlas-v2-nc",
            ):
                nc_clipped = _clip_model_file(
                    nc,
                    bbox,
                    xdim="nx",
                    ydim="ny",
                    ycoord="lat_z",
                    xcoord="lon_z",
                )

            else:
                raise Exception(f"Model {m} not supported")

            # Create directory and export
            (output_directory / file).parent.mkdir(parents=True, exist_ok=True)
            nc_clipped.to_netcdf(output_directory / file, mode="w")

        # Verify that models are ready
        pytmd_model(directory=output_directory).elevation(m=m).verify
        print(" ✅ Clipped model exported and verified")

    print(f"\nOutputs exported to {output_directory}")
    list_models(directory=output_directory, show_available=True, show_supported=False)

idw

idw(
    input_z,
    input_x,
    input_y,
    output_x,
    output_y,
    p=1,
    k=10,
    max_dist=None,
    k_min=1,
    epsilon=1e-12,
)

Perform Inverse Distance Weighting (IDW) interpolation.

This function performs fast IDW interpolation by creating a KDTree from the input coordinates then uses it to find the k nearest neighbors for each output point. Weights are calculated based on the inverse distance to each neighbor, with weights descreasing with increasing distance.

Code inspired by: https://github.com/DahnJ/REM-xarray

Parameters:

Name Type Description Default

input_z

array - like

Array of values at the input points. This can be either a 1-dimensional array, or a 2-dimensional array where each column (axis=1) represents a different set of values to be interpolated.

required

input_x

array - like

Array of x-coordinates of the input points.

required

input_y

array - like

Array of y-coordinates of the input points.

required

output_x

array - like

Array of x-coordinates where the interpolation is to be computed.

required

output_y

array - like

Array of y-coordinates where the interpolation is to be computed.

required

p

int or float

Power function parameter defining how rapidly weightings should decrease as distance increases. Higher values of p will cause weights for distant points to decrease rapidly, resulting in nearby points having more influence on predictions. Defaults to 1.

1

k

int

Number of nearest neighbors to use for interpolation. k=1 is equivalent to "nearest" neighbour interpolation. Defaults to 10.

10

max_dist

int or float

Restrict neighbouring points to less than this distance. By default, no distance limit is applied.

None

k_min

int

If max_dist is provided, some points may end up with less than k nearest neighbours, potentially producing less reliable interpolations. Set k_min to set any points with less than k_min neighbours to NaN. Defaults to 1.

1

epsilon

float

Small value added to distances to prevent division by zero errors in the case that output coordinates are identical to input coordinates. Defaults to 1e-12.

1e-12

Returns:

Name Type Description
interp_values ndarray

Interpolated values at the output coordinates. If input_z is 1-dimensional, interp_values will also be 1-dimensional. If input_z is 2-dimensional, interp_values will have the same number of rows as input_z, with each column (axis=1) representing interpolated values for one set of input data.

Examples:

>>> input_z = [1, 2, 3, 4, 5]
>>> input_x = [0, 1, 2, 3, 4]
>>> input_y = [0, 1, 2, 3, 4]
>>> output_x = [0.5, 1.5, 2.5]
>>> output_y = [0.5, 1.5, 2.5]
>>> idw(input_z, input_x, input_y, output_x, output_y, k=2)
array([1.5, 2.5, 3.5])
Source code in eo_tides/utils.py
def idw(
    input_z,
    input_x,
    input_y,
    output_x,
    output_y,
    p=1,
    k=10,
    max_dist=None,
    k_min=1,
    epsilon=1e-12,
):
    """Perform Inverse Distance Weighting (IDW) interpolation.

    This function performs fast IDW interpolation by creating a KDTree
    from the input coordinates then uses it to find the `k` nearest
    neighbors for each output point. Weights are calculated based on the
    inverse distance to each neighbor, with weights descreasing with
    increasing distance.

    Code inspired by: <https://github.com/DahnJ/REM-xarray>

    Parameters
    ----------
    input_z : array-like
        Array of values at the input points. This can be either a
        1-dimensional array, or a 2-dimensional array where each column
        (axis=1) represents a different set of values to be interpolated.
    input_x : array-like
        Array of x-coordinates of the input points.
    input_y : array-like
        Array of y-coordinates of the input points.
    output_x : array-like
        Array of x-coordinates where the interpolation is to be computed.
    output_y : array-like
        Array of y-coordinates where the interpolation is to be computed.
    p : int or float, optional
        Power function parameter defining how rapidly weightings should
        decrease as distance increases. Higher values of `p` will cause
        weights for distant points to decrease rapidly, resulting in
        nearby points having more influence on predictions. Defaults to 1.
    k : int, optional
        Number of nearest neighbors to use for interpolation. `k=1` is
        equivalent to "nearest" neighbour interpolation. Defaults to 10.
    max_dist : int or float, optional
        Restrict neighbouring points to less than this distance.
        By default, no distance limit is applied.
    k_min : int, optional
        If `max_dist` is provided, some points may end up with less than
        `k` nearest neighbours, potentially producing less reliable
        interpolations. Set `k_min` to set any points with less than
        `k_min` neighbours to NaN. Defaults to 1.
    epsilon : float, optional
        Small value added to distances to prevent division by zero
        errors in the case that output coordinates are identical to
        input coordinates. Defaults to 1e-12.

    Returns
    -------
    interp_values : numpy.ndarray
        Interpolated values at the output coordinates. If `input_z` is
        1-dimensional, `interp_values` will also be 1-dimensional. If
        `input_z` is 2-dimensional, `interp_values` will have the same
        number of rows as `input_z`, with each column (axis=1)
        representing interpolated values for one set of input data.

    Examples
    --------
    >>> input_z = [1, 2, 3, 4, 5]
    >>> input_x = [0, 1, 2, 3, 4]
    >>> input_y = [0, 1, 2, 3, 4]
    >>> output_x = [0.5, 1.5, 2.5]
    >>> output_y = [0.5, 1.5, 2.5]
    >>> idw(input_z, input_x, input_y, output_x, output_y, k=2)
    array([1.5, 2.5, 3.5])

    """
    # Convert to numpy arrays
    input_x = np.atleast_1d(input_x)
    input_y = np.atleast_1d(input_y)
    input_z = np.atleast_1d(input_z)
    output_x = np.atleast_1d(output_x)
    output_y = np.atleast_1d(output_y)

    # Verify input and outputs have matching lengths
    if not (input_z.shape[0] == len(input_x) == len(input_y)):
        raise ValueError("All of `input_z`, `input_x` and `input_y` must be the same length.")
    if not (len(output_x) == len(output_y)):
        raise ValueError("Both `output_x` and `output_y` must be the same length.")

    # Verify k is smaller than total number of points, and non-zero
    if k > input_z.shape[0]:
        raise ValueError(
            f"The requested number of nearest neighbours (`k={k}`) "
            f"is smaller than the total number of points ({input_z.shape[0]}).",
        )
    if k == 0:
        raise ValueError("Interpolation based on `k=0` nearest neighbours is not valid.")

    # Create KDTree to efficiently find nearest neighbours
    points_xy = np.column_stack((input_y, input_x))
    tree = KDTree(points_xy)

    # Determine nearest neighbours and distances to each
    grid_stacked = np.column_stack((output_y, output_x))
    distances, indices = tree.query(grid_stacked, k=k, workers=-1)

    # If k == 1, add an additional axis for consistency
    if k == 1:
        distances = distances[..., np.newaxis]
        indices = indices[..., np.newaxis]

    # Add small epsilon to distances to prevent division by zero errors
    # if output coordinates are the same as input coordinates
    distances = np.maximum(distances, epsilon)

    # Set distances above max to NaN if specified
    if max_dist is not None:
        distances[distances > max_dist] = np.nan

    # Calculate weights based on distance to k nearest neighbours.
    weights = 1 / np.power(distances, p)
    weights = weights / np.nansum(weights, axis=1).reshape(-1, 1)

    # 1D case: Compute weighted sum of input_z values for each output point
    if input_z.ndim == 1:
        interp_values = np.nansum(weights * input_z[indices], axis=1)

    # 2D case: Compute weighted sum for each set of input_z values
    # weights[..., np.newaxis] adds a dimension for broadcasting
    else:
        interp_values = np.nansum(
            weights[..., np.newaxis] * input_z[indices],
            axis=1,
        )

    # Set any points with less than `k_min` valid weights to NaN
    interp_values[np.isfinite(weights).sum(axis=1) < k_min] = np.nan

    return interp_values

list_models

list_models(
    directory=None,
    show_available=True,
    show_supported=True,
    raise_error=False,
)

List all tide models available for tide modelling.

This function scans the specified tide model directory and returns a list of models that are available in the directory as well as the full list of all models supported by eo-tides and pyTMD.

For instructions on setting up tide models, see: https://geoscienceaustralia.github.io/eo-tides/setup/

Parameters:

Name Type Description Default

directory

str

The directory containing tide model data files. If no path is provided, this will default to the environment variable EO_TIDES_TIDE_MODELS if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by pyTMD (https://geoscienceaustralia.github.io/eo-tides/setup/).

None

show_available

bool

Whether to print a list of locally available models.

True

show_supported

bool

Whether to print a list of all supported models, in addition to models available locally.

True

raise_error

bool

If True, raise an error if no available models are found. If False, raise a warning.

False

Returns:

Name Type Description
available_models list of str

A list of all tide models available within directory.

supported_models list of str

A list of all tide models supported by eo-tides.

Source code in eo_tides/utils.py
def list_models(
    directory: str | os.PathLike | None = None,
    show_available: bool = True,
    show_supported: bool = True,
    raise_error: bool = False,
) -> tuple[list[str], list[str]]:
    """
    List all tide models available for tide modelling.

    This function scans the specified tide model directory
    and returns a list of models that are available in the
    directory as well as the full list of all models supported
    by `eo-tides` and `pyTMD`.

    For instructions on setting up tide models, see:
    <https://geoscienceaustralia.github.io/eo-tides/setup/>

    Parameters
    ----------
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    show_available : bool, optional
        Whether to print a list of locally available models.
    show_supported : bool, optional
        Whether to print a list of all supported models, in
        addition to models available locally.
    raise_error : bool, optional
        If True, raise an error if no available models are found.
        If False, raise a warning.

    Returns
    -------
    available_models : list of str
        A list of all tide models available within `directory`.
    supported_models : list of str
        A list of all tide models supported by `eo-tides`.
    """
    init()  # Initialize colorama

    # Set tide modelling files directory. If no custom path is
    # provided, try global environment variable.
    directory = _set_directory(directory)

    # Get full list of supported models from pyTMD database
    model_database = load_database()["elevation"]
    supported_models = list(model_database.keys())

    # Extract expected model paths
    expected_paths = {}
    for m in supported_models:
        model_file = model_database[m]["model_file"]

        # Handle GOT5.6 differently to ensure we test for presence of GOT5.6 constituents
        if m in ("GOT5.6", "GOT5.6_extrapolated"):
            model_file = [file for file in model_file if "GOT5.6" in file][0]
        else:
            model_file = model_file[0] if isinstance(model_file, list) else model_file

        # Add path to dict
        expected_paths[m] = str(directory / pathlib.Path(model_file).expanduser().parent)

    # Define column widths
    status_width = 4  # Width for emoji
    name_width = max(len(name) for name in supported_models)
    path_width = max(len(path) for path in expected_paths.values())

    # Print list of supported models, marking available and
    # unavailable models and appending available to list
    if show_available or show_supported:
        total_width = min(status_width + name_width + path_width + 6, 80)
        print("─" * total_width)
        print(f"{'󠀠🌊':^{status_width}} | {'Model':<{name_width}} | {'Expected path':<{path_width}}")
        print("─" * total_width)

    available_models = []
    for m in supported_models:
        try:
            model_file = pytmd_model(directory=directory).elevation(m=m)
            available_models.append(m)

            if show_available:
                # Mark available models with a green tick
                status = "✅"
                print(f"{status:^{status_width}}{m:<{name_width}}{expected_paths[m]:<{path_width}}")
        except FileNotFoundError:
            if show_supported:
                # Mark unavailable models with a red cross
                status = "❌"
                print(
                    f"{status:^{status_width}}{Style.DIM}{m:<{name_width}}{expected_paths[m]:<{path_width}}{Style.RESET_ALL}"
                )

    if show_available or show_supported:
        print("─" * total_width)

        # Print summary
        print(f"\n{Style.BRIGHT}Summary:{Style.RESET_ALL}")
        print(f"Available models: {len(available_models)}/{len(supported_models)}")

    # Raise error or warning if no models are available
    if not available_models:
        warning_msg = textwrap.dedent(
            f"""
            No valid tide models are available in `{directory}`.
            Are you sure you have provided the correct `directory` path, or set the
            `EO_TIDES_TIDE_MODELS` environment variable to point to the location of your
            tide model directory?
            """
        ).strip()

        if raise_error:
            raise Exception(warning_msg)
        else:
            warnings.warn(warning_msg, UserWarning)

    # Return list of available and supported models
    return available_models, supported_models