API reference

eo_tides.model ¶

Core tide modelling functionality.

This module provides tools for modelling ocean tide heights and phases for any location or time period using one or more global tide models.

Functions:

Name	Description
`ensemble_tides`	Combine multiple tide models into a single locally optimised ensemble tide model.
`model_phases`	Model tide phases at multiple coordinates or timesteps using multiple ocean tide models.
`model_tides`	Model tide heights at multiple coordinates or timesteps using using multiple ocean tide models.

ensemble_tides ¶

ensemble_tides(
    tide_df,
    crs,
    ensemble_models,
    ensemble_func=None,
    ensemble_top_n=3,
    ranking_points="https://dea-public-data-dev.s3-ap-southeast-2.amazonaws.com/derivative/dea_intertidal/supplementary/rankings_ensemble_2017-2019.fgb",
    ranking_valid_perc=0.02,
    **idw_kwargs
)

Combine multiple tide models into a single locally optimised ensemble tide model.

Uses external model ranking data (e.g. satellite altimetry or NDWI-tide correlations along the coastline) to inform the selection of the best local models.

This function performs the following steps:

Takes a dataframe of tide heights from multiple tide models, as produced by eo_tides.model.model_tides
Loads model ranking points from an external file, filters them based on the valid data percentage, and retains relevant columns
Interpolates the model rankings into the coordinates of the original dataframe using Inverse Weighted Interpolation (IDW)
Uses rankings to combine multiple tide models into a single optimised ensemble model (by default, by taking the mean of the top 3 ranked models)
Returns a new dataframe with the combined ensemble model predictions

Parameters:

Name	Type	Description	Default
`tide_df` ¶	`DataFrame`	DataFrame produced by `eo_tides.model.model_tides`, containing tide model predictions in long format with columns: `["time", "x", "y", "tide_height", "tide_model"]`.	required
`crs` ¶	`string`	Coordinate reference system for the "x" and "y" coordinates in `tide_df`. Used to ensure that interpolations are performed in the correct CRS.	required
`ensemble_models` ¶	`list`	A list of models to include in the ensemble modelling process. All values must exist as columns with the prefix "rank_" in `ranking_points`.	required
`ensemble_func` ¶	`dict`	By default, a simple ensemble model will be calculated by taking the mean of the `ensemble_top_n` tide models at each location. However, a dictionary containing more complex ensemble calculations can also be provided. Dictionary keys are used to name output ensemble models; functions should take a column named "rank" and convert it to a weighting, e.g.: `ensemble_func = {"ensemble-custom": lambda x: x["rank"] <= 3}`	`None`
`ensemble_top_n` ¶	`int`	If `ensemble_func` is None, this sets the number of top models to include in the mean ensemble calculation. Defaults to 3.	`3`
`ranking_points` ¶	`str`	Path to the file containing model ranking points. This dataset should include columns containing rankings for each tide model, named with the prefix "rank_". e.g. "rank_EOT20". Low values should represent high rankings (e.g. 1 = top ranked). The default value points to an example file covering Australia.	`'https://dea-public-data-dev.s3-ap-southeast-2.amazonaws.com/derivative/dea_intertidal/supplementary/rankings_ensemble_2017-2019.fgb'`
`ranking_valid_perc` ¶	`float`	Minimum percentage of valid data required to include a model rank point in the analysis, as defined in a column named "valid_perc". Defaults to 0.02.	`0.02`
`**idw_kwargs` ¶		Optional keyword arguments to pass to the `idw` function used for interpolation. Useful values include `k` (number of nearest neighbours to use in interpolation), `max_dist` (maximum distance to nearest neighbours), and `k_min` (minimum number of neighbours required after `max_dist` is applied).	`{}`

Returns:

Type Description

DataFrame

DataFrame containing the ensemble model predictions, matching the format of the input tide_df (e.g. columns ["time", "x", "y", "tide_height", "tide_model"]. By default the 'tide_model' column will be labeled "ensemble" for the combined model predictions (but if a custom dictionary of ensemble functions is provided via ensemble_func, each ensemble will be named using the provided dictionary keys).

Source code in eo_tides/model.py

def ensemble_tides(
    tide_df,
    crs,
    ensemble_models,
    ensemble_func=None,
    ensemble_top_n=3,
    ranking_points="https://dea-public-data-dev.s3-ap-southeast-2.amazonaws.com/derivative/dea_intertidal/supplementary/rankings_ensemble_2017-2019.fgb",
    ranking_valid_perc=0.02,
    **idw_kwargs,
):
    """Combine multiple tide models into a single locally optimised ensemble tide model.

    Uses external model ranking data (e.g. satellite altimetry or
    NDWI-tide correlations along the coastline) to inform the
    selection of the best local models.

    This function performs the following steps:

    1. Takes a dataframe of tide heights from multiple tide models, as
       produced by `eo_tides.model.model_tides`
    2. Loads model ranking points from an external file, filters them
       based on the valid data percentage, and retains relevant columns
    3. Interpolates the model rankings into the coordinates of the
       original dataframe using Inverse Weighted Interpolation (IDW)
    4. Uses rankings to combine multiple tide models into a single
       optimised ensemble model (by default, by taking the mean of the
       top 3 ranked models)
    5. Returns a new dataframe with the combined ensemble model predictions

    Parameters
    ----------
    tide_df : pandas.DataFrame
        DataFrame produced by `eo_tides.model.model_tides`, containing
        tide model predictions in long format with columns:
        `["time", "x", "y", "tide_height", "tide_model"]`.
    crs : string
        Coordinate reference system for the "x" and "y" coordinates in
        `tide_df`. Used to ensure that interpolations are performed
        in the correct CRS.
    ensemble_models : list
        A list of models to include in the ensemble modelling process.
        All values must exist as columns with the prefix "rank_" in
        `ranking_points`.
    ensemble_func : dict, optional
        By default, a simple ensemble model will be calculated by taking
        the mean of the `ensemble_top_n` tide models at each location.
        However, a dictionary containing more complex ensemble
        calculations can also be provided. Dictionary keys are used
        to name output ensemble models; functions should take a column
        named "rank" and convert it to a weighting, e.g.:
        `ensemble_func = {"ensemble-custom": lambda x: x["rank"] <= 3}`
    ensemble_top_n : int, optional
        If `ensemble_func` is None, this sets the number of top models
        to include in the mean ensemble calculation. Defaults to 3.
    ranking_points : str, optional
        Path to the file containing model ranking points. This dataset
        should include columns containing rankings for each tide
        model, named with the prefix "rank_". e.g. "rank_EOT20".
        Low values should represent high rankings (e.g. 1 = top ranked).
        The default value points to an example file covering Australia.
    ranking_valid_perc : float, optional
        Minimum percentage of valid data required to include a model
        rank point in the analysis, as defined in a column named
        "valid_perc". Defaults to 0.02.
    **idw_kwargs
        Optional keyword arguments to pass to the `idw` function used
        for interpolation. Useful values include `k` (number of nearest
        neighbours to use in interpolation), `max_dist` (maximum
        distance to nearest neighbours), and `k_min` (minimum number of
        neighbours required after `max_dist` is applied).

    Returns
    -------
    pandas.DataFrame
        DataFrame containing the ensemble model predictions, matching
        the format of the input `tide_df` (e.g. columns `["time", "x",
        "y", "tide_height", "tide_model"]`. By default the 'tide_model'
        column will be labeled "ensemble" for the combined model
        predictions (but if a custom dictionary of ensemble functions is
        provided via `ensemble_func`, each ensemble will be named using
        the provided dictionary keys).

    """
    # Raise data if `tide_df` provided in wide format
    if "tide_model" not in tide_df:
        err_msg = (
            "`tide_df` does not contain the expected 'tide_model' and "
            "'tide_height' columns. Ensure that tides were modelled in "
            "long format (i.e. `output_format='long'` in `model_tides`).",
        )
        raise Exception(err_msg)

    # Extract x and y coords from dataframe
    x = tide_df.index.get_level_values(level="x")
    y = tide_df.index.get_level_values(level="y")

    # Identify input datatype
    input_dtype = tide_df.tide_height.dtype

    # Load model ranks points and reproject to same CRS as x and y
    model_ranking_cols = [f"rank_{m}" for m in ensemble_models]
    try:
        model_ranks_gdf = (
            gpd.read_file(ranking_points, engine="pyogrio")
            .to_crs(crs)
            .query(f"valid_perc > {ranking_valid_perc}")
            .dropna(how="all")
            .filter(model_ranking_cols + ["geometry"])  # noqa: RUF005
        )
    except KeyError:
        error_msg = f"""
        Not all of the expected "rank_" columns {model_ranking_cols} were
        found in the columns of the ranking points file ({ranking_points}).
        Consider passing a custom list of models using `ensemble_models`.
        """
        raise Exception(textwrap.dedent(error_msg).strip()) from None

    # Use points to interpolate model rankings into requested x and y
    id_kwargs_str = "" if idw_kwargs == {} else idw_kwargs
    print(f"Interpolating model rankings using IDW interpolation {id_kwargs_str}")
    ensemble_ranks_df = (
        # Run IDW interpolation on subset of ranking columns
        pd.DataFrame(
            idw(
                input_z=model_ranks_gdf[model_ranking_cols],
                input_x=model_ranks_gdf.geometry.x,
                input_y=model_ranks_gdf.geometry.y,
                output_x=x,
                output_y=y,
                **idw_kwargs,
            ),
            columns=model_ranking_cols,
        )
        .assign(x=x, y=y)
        # Drop any duplicates then melt columns into long format
        .drop_duplicates()
        .melt(id_vars=["x", "y"], var_name="tide_model", value_name="rank")
        # Remore "rank_" prefix to get plain model names
        .replace({"^rank_": ""}, regex=True)
        # Set index columns and rank across groups
        .set_index(["tide_model", "x", "y"])
        .groupby(["x", "y"])
        .rank()
        .astype("float32")  # use smaller dtype for rankings to save memory
    )

    # If no custom ensemble funcs are provided, use a default ensemble
    # calculation that takes the mean of the top N tide models
    if ensemble_func is None:
        ensemble_func = {"ensemble": lambda x: x["rank"] <= ensemble_top_n}

    # Create output list to hold computed ensemble model outputs
    ensemble_list = []

    # Loop through all provided ensemble generation functions
    for ensemble_n, ensemble_f in ensemble_func.items():
        print(f"Combining models into single {ensemble_n} model")

        # Join ranks to input tide data, compute weightings and group
        grouped = (
            # Add tide model as an index so we can join with model ranks
            tide_df.set_index("tide_model", append=True)
            .join(ensemble_ranks_df)
            # Add temp columns containing weightings and weighted values
            .assign(
                weights=ensemble_f,  # use custom func to compute weights
                weighted=lambda i: i.tide_height * i.weights,
            )
            # Groupby is specified in a weird order here as this seems
            # to be the easiest way to preserve correct index sorting
            .groupby(["x", "y", "time"])
        )

        # Use weightings to combine multiple models into single ensemble
        ensemble_df = (
            # Calculate weighted mean
            grouped.weighted.sum()
            .div(grouped.weights.sum())
            # Make sure datatype is the same as the input
            .astype(input_dtype)
            # Convert to dataframe
            .to_frame("tide_height")
            # Label ensemble model and ensure indexes are in expected order
            .assign(tide_model=ensemble_n)
            .reorder_levels(["time", "x", "y"], axis=0)
        )

        ensemble_list.append(ensemble_df)

    # Combine all ensemble models and return as a single dataframe
    return pd.concat(ensemble_list)

model_phases ¶

model_phases(
    x,
    y,
    time,
    model="EOT20",
    directory=None,
    time_offset="15 min",
    return_tides=False,
    **model_tides_kwargs
)

Model tide phases at multiple coordinates or timesteps using multiple ocean tide models.

Ebb and low phases (low-flow, high-flow, high-ebb, low-ebb) are calculated by running the eo_tides.model.model_tides function twice, once for the requested timesteps, and again after subtracting a small time offset (15 mins by default). If tides increased over this period, they are assigned as "flow"; if they decreased, they are assigned as "ebb". Tides are considered "high" if equal or greater than 0 metres tide height, otherwise "low".

This function supports all parameters that are supported by model_tides.

For tide model setup instructions, refer to the guide: https://geoscienceaustralia.github.io/eo-tides/setup/

Parameters:

Name	Type	Description	Default
`x` ¶	`float or list of floats`	One or more x coordinates at which to model tides. Assumes degrees longitude (EPSG:4326) by default; use `crs` to specify a different coordinate reference system.	required
`y` ¶	`float or list of floats`	One or more y coordinates at which to model tides. Assumes degrees latitude (EPSG:4326) by default; use `crs` to specify a different coordinate reference system.	required
`time` ¶	`DatetimeLike`	One or more UTC times at which to model tide heights. Accepts any time format compatible with `pandas.to_datetime()`, e.g. datetime.datetime, pd.Timestamp, pd.DatetimeIndex, numpy.datetime64, or date/time strings (e.g. "2020-01-01 23:00"). For example: `time = pd.date_range(start="2000", end="2001", freq="5h")`.	required
`model` ¶	`str or list of str`	The tide model (or list of models) to use to model tides. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported models, run `from eo_tides.utils import list_models; list_models()`.	`'EOT20'`
`directory` ¶	`str`	The directory containing tide model data files. If no path is provided, this will default to the environment variable `EO_TIDES_TIDE_MODELS` if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by `pyTMD` (https://geoscienceaustralia.github.io/eo-tides/setup/).	`None`
`time_offset` ¶	`str`	The time offset/delta used to generate a time series of offset tide heights required for phase calculation. Defeaults to "15 min"; can be any string passed to `pandas.Timedelta`.	`'15 min'`
`return_tides` ¶	`bool`	Whether to return intermediate modelled tide heights as a "tide_height" column in the output dataframe. Defaults to False.	`False`
`**model_tides_kwargs` ¶		Optional parameters passed to the `eo_tides.model.model_tides` function. Important parameters include `output_format` (e.g. whether to return results in wide or long format), `crop` (whether to crop tide model constituent files on-the-fly to improve performance) etc.	`{}`

Returns:

Type	Description
`DataFrame`	A dataframe containing modelled tide phases.

Source code in eo_tides/model.py

def model_phases(
    x: float | list[float] | xr.DataArray,
    y: float | list[float] | xr.DataArray,
    time: DatetimeLike,
    model: str | list[str] = "EOT20",
    directory: str | os.PathLike | None = None,
    time_offset: str = "15 min",
    return_tides: bool = False,
    **model_tides_kwargs,
) -> pd.DataFrame:
    """Model tide phases at multiple coordinates or timesteps using multiple ocean tide models.

    Ebb and low phases (low-flow, high-flow, high-ebb, low-ebb)
    are calculated by running the `eo_tides.model.model_tides`
    function twice, once for the requested timesteps, and again
    after subtracting a small time offset (15 mins by default).
    If tides increased over this period, they are assigned as
    "flow"; if they decreased, they are assigned as "ebb".
    Tides are considered "high" if equal or greater than 0
    metres tide height, otherwise "low".

    This function supports all parameters that are supported
    by `model_tides`.

    For tide model setup instructions, refer to the guide:
    https://geoscienceaustralia.github.io/eo-tides/setup/

    Parameters
    ----------
    x : float or list of floats
        One or more x coordinates at which to model tides. Assumes
        degrees longitude (EPSG:4326) by default; use `crs` to specify
        a different coordinate reference system.
    y : float or list of floats
        One or more y coordinates at which to model tides. Assumes
        degrees latitude (EPSG:4326) by default; use `crs` to specify
        a different coordinate reference system.
    time : DatetimeLike
        One or more UTC times at which to model tide heights. Accepts
        any time format compatible with `pandas.to_datetime()`, e.g.
        datetime.datetime, pd.Timestamp, pd.DatetimeIndex, numpy.datetime64,
        or date/time strings (e.g. "2020-01-01 23:00"). For example:
        `time = pd.date_range(start="2000", end="2001", freq="5h")`.
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        Defaults to "EOT20"; specify "all" to use all models available
        in `directory`. For a full list of available and supported models,
        run `from eo_tides.utils import list_models; list_models()`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    time_offset: str, optional
        The time offset/delta used to generate a time series of
        offset tide heights required for phase calculation. Defeaults
        to "15 min"; can be any string passed to `pandas.Timedelta`.
    return_tides: bool, optional
        Whether to return intermediate modelled tide heights as a
        "tide_height" column in the output dataframe. Defaults to False.
    **model_tides_kwargs :
        Optional parameters passed to the `eo_tides.model.model_tides`
        function. Important parameters include `output_format` (e.g.
        whether to return results in wide or long format), `crop`
        (whether to crop tide model constituent files on-the-fly to
        improve performance) etc.

    Returns
    -------
    pandas.DataFrame
        A dataframe containing modelled tide phases.

    """
    # Pop output format and mode for special handling
    output_format = model_tides_kwargs.pop("output_format", "long")
    mode = model_tides_kwargs.pop("mode", "one-to-many")

    # Model tides
    tide_df = model_tides(
        x=x,
        y=y,
        time=time,
        model=model,
        directory=directory,
        **model_tides_kwargs,
    )

    # Model tides for a time 15 minutes prior to each previously
    # modelled satellite acquisition time. This allows us to compare
    # tide heights to see if they are rising or falling.
    pre_df = model_tides(
        x=x,
        y=y,
        time=time - pd.Timedelta(time_offset),
        model=model,
        directory=directory,
        **model_tides_kwargs,
    )

    # Compare tides computed for each timestep. If the previous tide
    # was higher than the current tide, the tide is 'ebbing'. If the
    # previous tide was lower, the tide is 'flowing'
    ebb_flow = (tide_df.tide_height < pre_df.tide_height.to_numpy()).replace(
        {True: "ebb", False: "flow"},
    )

    # If tides are greater than 0, then "high", otherwise "low"
    high_low = (tide_df.tide_height >= 0).replace({True: "high", False: "low"})

    # Combine into one string and add to data
    tide_df["tide_phase"] = high_low.astype(str) + "-" + ebb_flow.astype(str)

    # Optionally convert to a wide format dataframe with a tide model in
    # each dataframe column
    if output_format == "wide":
        # Pivot into wide format with each time model as a column
        print("Converting to a wide format dataframe")
        tide_df = tide_df.pivot(columns="tide_model")  # noqa: PD010

        # If in 'one-to-one' mode, reindex using our input time/x/y
        # values to ensure the output is sorted the same as our inputs
        if mode == "one-to-one":
            output_indices = pd.MultiIndex.from_arrays(
                [time, x, y],
                names=["time", "x", "y"],
            )
            tide_df = tide_df.reindex(output_indices)

        # Optionally drop tides
        if not return_tides:
            return tide_df.drop("tide_height", axis=1)["tide_phase"]

    # Optionally drop tide heights
    if not return_tides:
        return tide_df.drop("tide_height", axis=1)

    return tide_df

model_tides ¶

model_tides(
    x,
    y,
    time,
    model="EOT20",
    directory=None,
    crs="EPSG:4326",
    mode="one-to-many",
    output_format="long",
    output_units="m",
    method="linear",
    extrapolate=True,
    cutoff=None,
    crop="auto",
    crop_buffer=5,
    append_node=False,
    constituents=None,
    parallel=True,
    parallel_splits="auto",
    parallel_max=None,
    ensemble_models=None,
    extra_databases=None,
    **ensemble_kwargs
)

Model tide heights at multiple coordinates or timesteps using using multiple ocean tide models.

This function is parallelised to improve performance, and supports all tidal models supported by pyTMD, including:

Empirical Ocean Tide model (EOT20)
Finite Element Solution tide models (FES2022, FES2014, FES2012)
TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)
Technical University of Denmark tide models (DTU23)

This function requires access to tide model data files. For tide model setup instructions, refer to the guide: https://geoscienceaustralia.github.io/eo-tides/setup/

This function is a modification of the pyTMD package's pyTMD.compute.tide_elevations function. For more info: https://pytmd.readthedocs.io/en/latest/api_reference/compute.html#pyTMD.compute.tide_elevations https://pytmd.readthedocs.io/en/latest/getting_started/Getting-Started.html#directories

Parameters:

Name	Type	Description	Default
`x` ¶	`float or list of floats`	One or more x coordinates at which to model tides. Assumes degrees longitude (EPSG:4326) by default; use `crs` to specify a different coordinate reference system.	required
`y` ¶	`float or list of floats`	One or more y coordinates at which to model tides. Assumes degrees latitude (EPSG:4326) by default; use `crs` to specify a different coordinate reference system.	required
`time` ¶	`DatetimeLike`	One or more UTC times at which to model tide heights. Accepts any time format compatible with `pandas.to_datetime()`, e.g. datetime.datetime, pd.Timestamp, pd.DatetimeIndex, numpy.datetime64, or date/time strings (e.g. "2020-01-01 23:00"). For example: `time = pd.date_range(start="2000", end="2001", freq="5h")`.	required
`model` ¶	`str or list of str`	The tide model (or list of models) to use to model tides. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported models, run `from eo_tides.utils import list_models; list_models()`.	`'EOT20'`
`directory` ¶	`str`	The directory containing tide model data files. If no path is provided, this will default to the environment variable `EO_TIDES_TIDE_MODELS` if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by `pyTMD` (https://geoscienceaustralia.github.io/eo-tides/setup/).	`None`
`crs` ¶	`str`	Input coordinate reference system for x/y coordinates. Defaults to "EPSG:4326" (degrees latitude, longitude).	`'EPSG:4326'`
`mode` ¶	`str`	Tide modelling analysis mode. Supports two options: `"one-to-many"`: Models tides at every x/y coordinate for every timestep in `time`. This is useful for Earth observation workflows where you want to model tides at many spatial points for a common set of acquisition times (e.g. satellite overpasses). `"one-to-one"`: Model tides using one timestep for each x/y coordinate. In this mode, the number of x/y coordinates must match the number of timesteps in `time`.	`'one-to-many'`
`output_format` ¶	`str`	Whether to return the output dataframe in long format (with results stacked vertically along "tide_model" and "tide_height" columns), or wide format (with a column for each tide model). Defaults to "long".	`'long'`
`output_units` ¶	`str`	Units for the returned tide heights. Options are: `"m"` (default): floating point values in metres `"cm"`: integer values in centimetres (x100) `"mm"`: integer values in millimetres (x1000) Using integer units can help reduce memory usage.	`'m'`
`method` ¶	`str`	Method used to interpolate tide model constituent files. Defaults to "linear"; options include: `"linear"`, `"nearest"`: scipy regular grid interpolations `"spline"`: scipy bivariate spline interpolation `"bilinear"`: quick bilinear interpolation	`'linear'`
`extrapolate` ¶	`bool`	If True (default), extrapolate tides inland of the valid tide model extent using nearest-neighbor interpolation. This can ensure tide are returned everywhere, but accuracy may degrade with distance from the valid model extent (e.g. inland or along complex estuaries or rivers). Set `cutoff` to define the maximum extrapolation distance.	`True`
`cutoff` ¶	`float`	Maximum distance in kilometres to extrapolate tides inland of the valid tide model extent. The default of None allows extrapolation at any (i.e. infinite) distance.	`None`
`crop` ¶	`bool or str`	Whether to crop tide model files on-the-fly to improve performance. Defaults to "auto", which enables cropping when supported (some clipped model files limited to the western hemisphere may not support on-the-fly cropping). Use `crop_buffer` to adjust the buffer distance used for cropping.	`'auto'`
`crop_buffer` ¶	`int or float`	The buffer distance in degrees to crop tide model files around the requested x/y coordinates. Defaults to 5, which will crop model files using a five degree buffer.	`5`
`append_node` ¶	`bool`	Apply adjustments to harmonic constituents to allow for periodic modulations over the 18.6-year nodal period (lunar nodal tide). Default is False.	`False`
`constituents` ¶	`list`	Optional list of tide constituents to use for tide prediction. Default is None, which will use all available constituents.	`None`
`parallel` ¶	`bool`	Whether to parallelise tide modelling. If multiple tide models are requested, these will be run in parallel. If enough workers are available, the analysis will also be split into spatial chunks for additional parallelisation (see "parallel_splits" below). Default is True.	`True`
`parallel_splits` ¶	`str or int`	Whether to split the input x and y coordinates into smaller, evenly-sized chunks that are processed in parallel. This can provide a large performance boost when processing large numbers of coordinates. The default is "auto", which will automatically attempt to determine optimal splits based on available CPUs, the number of input points, and the number of models.	`'auto'`
`parallel_max` ¶	`int`	Maximum number of processes to run in parallel. The default of None will automatically determine this from your available CPUs.	`None`
`ensemble_models` ¶	`list of str`	An optional list of models used to generate the ensemble tide model if "ensemble" tide modelling is requested. Defaults to `["EOT20", "FES2012", "FES2014_extrapolated", "FES2022_extrapolated", "GOT4.10", "GOT5.5_extrapolated", "GOT5.6_extrapolated", "TPXO10-atlas-v2-nc", "TPXO8-atlas-nc", "TPXO9-atlas-v5-nc"]`.	`None`
`extra_databases` ¶	`str or path or list`	Additional custom tide model definitions to load, provided as dictionaries or paths to JSON database files. Use this to enable custom tide models not included with `pyTMD`. See: https://pytmd.readthedocs.io/en/latest/getting_started/Getting-Started.html#model-database	`None`
`**ensemble_kwargs` ¶		Keyword arguments used to customise the generation of optional ensemble tide models if "ensemble" tide modelling is requested. These are passed to the underlying `_ensemble_model` function. Useful parameters include `ranking_points` (path to model rankings data), `k` (for controlling how model rankings are interpolated), and `ensemble_top_n` (how many top models to use in the ensemble calculation).	`{}`

Returns:

Type	Description
`DataFrame`	A dataframe containing modelled tide heights.

Source code in eo_tides/model.py

def model_tides(
    x: float | list[float] | xr.DataArray,
    y: float | list[float] | xr.DataArray,
    time: DatetimeLike,
    model: str | list[str] = "EOT20",
    directory: str | os.PathLike | None = None,
    crs: str = "EPSG:4326",
    mode: str = "one-to-many",
    output_format: str = "long",
    output_units: str = "m",
    method: str = "linear",
    extrapolate: bool = True,
    cutoff: float | None = None,
    crop: bool | str = "auto",
    crop_buffer: float | None = 5,
    append_node: bool = False,
    constituents: list[str] | None = None,
    parallel: bool = True,
    parallel_splits: int | str = "auto",
    parallel_max: int | None = None,
    ensemble_models: list[str] | None = None,
    extra_databases: str | os.PathLike | list | None = None,
    **ensemble_kwargs,
) -> pd.DataFrame:
    """Model tide heights at multiple coordinates or timesteps using using multiple ocean tide models.

    This function is parallelised to improve performance, and
    supports all tidal models supported by `pyTMD`, including:

    - Empirical Ocean Tide model (EOT20)
    - Finite Element Solution tide models (FES2022, FES2014, FES2012)
    - TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
    - Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
    - Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)
    - Technical University of Denmark tide models (DTU23)

    This function requires access to tide model data files.
    For tide model setup instructions, refer to the guide:
    https://geoscienceaustralia.github.io/eo-tides/setup/

    This function is a modification of the `pyTMD` package's
    `pyTMD.compute.tide_elevations` function. For more info:
    https://pytmd.readthedocs.io/en/latest/api_reference/compute.html#pyTMD.compute.tide_elevations
    https://pytmd.readthedocs.io/en/latest/getting_started/Getting-Started.html#directories

    Parameters
    ----------
    x : float or list of floats
        One or more x coordinates at which to model tides. Assumes
        degrees longitude (EPSG:4326) by default; use `crs` to specify
        a different coordinate reference system.
    y : float or list of floats
        One or more y coordinates at which to model tides. Assumes
        degrees latitude (EPSG:4326) by default; use `crs` to specify
        a different coordinate reference system.
    time : DatetimeLike
        One or more UTC times at which to model tide heights. Accepts
        any time format compatible with `pandas.to_datetime()`, e.g.
        datetime.datetime, pd.Timestamp, pd.DatetimeIndex, numpy.datetime64,
        or date/time strings (e.g. "2020-01-01 23:00"). For example:
        `time = pd.date_range(start="2000", end="2001", freq="5h")`.
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        Defaults to "EOT20"; specify "all" to use all models available
        in `directory`. For a full list of available and supported models,
        run `from eo_tides.utils import list_models; list_models()`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    crs : str, optional
        Input coordinate reference system for x/y coordinates.
        Defaults to "EPSG:4326" (degrees latitude, longitude).
    mode : str, optional
        Tide modelling analysis mode. Supports two options:

        - `"one-to-many"`: Models tides at every x/y coordinate for
        every timestep in `time`. This is useful for Earth observation
        workflows where you want to model tides at many spatial points
        for a common set of acquisition times (e.g. satellite overpasses).

        - `"one-to-one"`: Model tides using one timestep for each x/y
        coordinate. In this mode, the number of x/y coordinates must
        match the number of timesteps in `time`.
    output_format : str, optional
        Whether to return the output dataframe in long format (with
        results stacked vertically along "tide_model" and "tide_height"
        columns), or wide format (with a column for each tide model).
        Defaults to "long".
    output_units : str, optional
        Units for the returned tide heights. Options are:

        - `"m"` (default): floating point values in metres
        - `"cm"`: integer values in centimetres (x100)
        - `"mm"`: integer values in millimetres (x1000)

        Using integer units can help reduce memory usage.
    method : str, optional
        Method used to interpolate tide model constituent files.
        Defaults to "linear"; options include:

        - `"linear"`, `"nearest"`: scipy regular grid interpolations
        - `"spline"`: scipy bivariate spline interpolation
        - `"bilinear"`: quick bilinear interpolation
    extrapolate : bool, optional
        If True (default), extrapolate tides inland of the valid tide
        model extent using nearest-neighbor interpolation. This can
        ensure tide are returned everywhere, but accuracy may degrade
        with distance from the valid model extent (e.g. inland or along
        complex estuaries or rivers). Set `cutoff` to define the
        maximum extrapolation distance.
    cutoff : float, optional
        Maximum distance in kilometres to extrapolate tides inland of the
        valid tide model extent. The default of None allows extrapolation
        at any (i.e. infinite) distance.
    crop : bool or str, optional
        Whether to crop tide model files on-the-fly to improve performance.
        Defaults to "auto", which enables cropping when supported (some
        clipped model files limited to the western hemisphere may not support
        on-the-fly cropping). Use `crop_buffer` to adjust the buffer
        distance used for cropping.
    crop_buffer : int or float, optional
        The buffer distance in degrees to crop tide model files around the
        requested x/y coordinates. Defaults to 5, which will crop model
        files using a five degree buffer.
    append_node : bool, optional
        Apply adjustments to harmonic constituents to allow for periodic
        modulations over the 18.6-year nodal period (lunar nodal tide).
        Default is False.
    constituents : list, optional
        Optional list of tide constituents to use for tide prediction.
        Default is None, which will use all available constituents.
    parallel : bool, optional
        Whether to parallelise tide modelling. If multiple tide models
        are requested, these will be run in parallel. If enough workers
        are available, the analysis will also be split into spatial
        chunks for additional parallelisation (see "parallel_splits"
        below). Default is True.
    parallel_splits : str or int, optional
        Whether to split the input x and y coordinates into smaller,
        evenly-sized chunks that are processed in parallel. This can
        provide a large performance boost when processing large numbers
        of coordinates. The default is "auto", which will automatically
        attempt to determine optimal splits based on available CPUs,
        the number of input points, and the number of models.
    parallel_max : int, optional
        Maximum number of processes to run in parallel. The default of
        None will automatically determine this from your available CPUs.
    ensemble_models : list of str, optional
        An optional list of models used to generate the ensemble tide
        model if "ensemble" tide modelling is requested. Defaults to
        `["EOT20", "FES2012", "FES2014_extrapolated", "FES2022_extrapolated",
        "GOT4.10", "GOT5.5_extrapolated", "GOT5.6_extrapolated",
        "TPXO10-atlas-v2-nc", "TPXO8-atlas-nc", "TPXO9-atlas-v5-nc"]`.
    extra_databases : str or path or list, optional
        Additional custom tide model definitions to load, provided as
        dictionaries or paths to JSON database files. Use this to
        enable custom tide models not included with `pyTMD`.
        See: https://pytmd.readthedocs.io/en/latest/getting_started/Getting-Started.html#model-database
    **ensemble_kwargs :
        Keyword arguments used to customise the generation of optional
        ensemble tide models if "ensemble" tide modelling is requested.
        These are passed to the underlying `_ensemble_model` function.
        Useful parameters include `ranking_points` (path to model
        rankings data), `k` (for controlling how model rankings are
        interpolated), and `ensemble_top_n` (how many top models to use
        in the ensemble calculation).

    Returns
    -------
    pandas.DataFrame
        A dataframe containing modelled tide heights.

    """
    # Turn inputs into arrays for consistent handling
    x = np.atleast_1d(x)
    y = np.atleast_1d(y)
    time = _standardise_time(time)

    # Validate input arguments
    if time is None:
        err_msg = "Times for modelling tides must be provided via `time`."
        raise ValueError(err_msg)

    if method not in ("bilinear", "spline", "linear", "nearest"):
        err_msg = (
            f"Invalid interpolation method '{method}'. Must be one of 'bilinear', 'spline', 'linear', or 'nearest'."
        )
        raise ValueError(err_msg)

    if output_units not in ("m", "cm", "mm"):
        err_msg = "Output units must be either 'm', 'cm', or 'mm'."
        raise ValueError(err_msg)

    if output_format not in ("long", "wide"):
        err_msg = "Output format must be either 'long' or 'wide'."
        raise ValueError(err_msg)

    if not np.issubdtype(x.dtype, np.number):
        err_msg = "`x` must contain only valid numeric values, and must not be None."
        raise TypeError(err_msg)

    if not np.issubdtype(y.dtype, np.number):
        err_msg = "`y` must contain only valid numeric values, and must not be None."
        raise TypeError(err_msg)

    if len(x) != len(y):
        err_msg = "`x` and `y` must be the same length."
        raise ValueError(err_msg)

    if mode == "one-to-one" and len(x) != len(time):
        err_msg = (
            "The number of supplied `x` and `y` points and `time` values must be "
            "identical in 'one-to-one' mode. Use 'one-to-many' mode if you intended "
            "to model multiple timesteps at each point."
        )
        raise ValueError(err_msg)

    # Set tide modelling files directory. If no custom path is
    # provided, try global environment variable.
    directory = _set_directory(directory)

    # Standardise model list, handling "all" and "ensemble" functionality,
    # and any custom tide model definitions
    models_to_process, models_requested, ensemble_models = _standardise_models(
        model=model,
        directory=directory,
        ensemble_models=ensemble_models,
        extra_databases=extra_databases,
    )

    # Update tide modelling func to add default keyword arguments that
    # are used for every iteration during parallel processing
    iter_func = partial(
        _model_tides,
        directory=directory,
        crs=crs,
        mode=mode,
        output_units=output_units,
        method=method,
        extrapolate=extrapolate,
        cutoff=np.inf if cutoff is None else cutoff,
        crop=crop,
        crop_buffer=crop_buffer,
        append_node=append_node,
        constituents=constituents,
        extra_databases=extra_databases,
    )

    # If automatic parallel splits, calculate optimal value
    # based on available parallelisation, number of points
    # and number of models
    if parallel_splits == "auto":
        parallel_splits = _parallel_splits(
            total_points=len(x),
            model_count=len(models_to_process),
            parallel_max=parallel_max,
        )

    # Verify that parallel splits are not larger than number of points
    assert isinstance(parallel_splits, int)  # noqa: S101
    if parallel_splits > len(x):
        err_msg = f"Parallel splits ({parallel_splits}) cannot be larger than the number of points ({len(x)})."
        raise ValueError(err_msg)

    # Parallelise if either multiple models or multiple splits requested
    if parallel & ((len(models_to_process) > 1) | (parallel_splits > 1)):
        with ProcessPoolExecutor(max_workers=parallel_max) as executor:
            print(
                f"Modelling tides with {', '.join(models_to_process)} in parallel (models: {len(models_to_process)}, splits: {parallel_splits})",
            )

            # Optionally split lon/lat points into `splits_n` chunks
            # that will be applied in parallel
            x_split = np.array_split(x, parallel_splits)
            y_split = np.array_split(y, parallel_splits)

            # Get every combination of models and lat/lon points, and
            # extract as iterables that can be passed to `executor.map()`
            # In "one-to-many" mode, pass entire set of timesteps to each
            # parallel iteration by repeating timesteps by number of total
            # parallel iterations. In "one-to-one" mode, split up
            # timesteps into smaller parallel chunks too.
            if mode == "one-to-many":
                model_iters, x_iters, y_iters = zip(
                    *[(m, x_split[i], y_split[i]) for m in models_to_process for i in range(parallel_splits)],
                    strict=False,
                )
                time_iters = [time] * len(model_iters)
            elif mode == "one-to-one":
                time_split = np.array_split(time, parallel_splits)
                model_iters, x_iters, y_iters, time_iters = zip(
                    *[
                        (m, x_split[i], y_split[i], time_split[i])
                        for m in models_to_process
                        for i in range(parallel_splits)
                    ],
                    strict=False,
                )

            # Apply func in parallel, iterating through each input param
            try:
                model_outputs = list(
                    tqdm(
                        executor.map(
                            iter_func,
                            model_iters,
                            x_iters,
                            y_iters,
                            time_iters,
                        ),
                        total=len(model_iters),
                    ),
                )
            except BrokenProcessPool:
                err_msg = (
                    "Parallelised tide modelling failed, likely to to an out-of-memory error. "
                    "Try reducing the size of your analysis, or set `parallel=False`."
                )
                raise RuntimeError(err_msg) from None

    # Model tides in series if parallelisation is off
    else:
        model_outputs = []

        for model_i in models_to_process:
            print(f"Modelling tides with {model_i}")
            tide_df = iter_func(model_i, x, y, time)
            model_outputs.append(tide_df)

    # Combine outputs into a single dataframe
    tide_df = pd.concat(model_outputs, axis=0)

    # Optionally compute ensemble model and add to dataframe
    if "ensemble" in models_requested:
        ensemble_df = ensemble_tides(tide_df, crs, ensemble_models, **ensemble_kwargs)

        # Update requested models with any custom ensemble models, then
        # filter the dataframe to keep only models originally requested
        models_requested = list(
            np.union1d(models_requested, ensemble_df.tide_model.unique()),
        )
        tide_df = pd.concat([tide_df, ensemble_df]).query(
            "tide_model in @models_requested",
        )

    # Optionally convert to a wide format dataframe with a tide model in
    # each dataframe column
    if output_format == "wide":
        # Pivot into wide format with each time model as a column
        print("Converting to a wide format dataframe")
        tide_df = tide_df.pivot(columns="tide_model", values="tide_height")  # noqa: PD010

        # If in 'one-to-one' mode, reindex using our input time/x/y
        # values to ensure the output is sorted the same as our inputs
        if mode == "one-to-one":
            output_indices = pd.MultiIndex.from_arrays(
                [time, x, y],
                names=["time", "x", "y"],
            )
            tide_df = tide_df.reindex(output_indices)

    return tide_df

eo_tides.eo ¶

Tools for integrating satellite EO data with tide modelling.

This module provides tools for combining satellite EO observations with tide heights and phases using one or more ocean tide models, both at the timestep and at the pixel level.

Functions:

Name	Description
`pixel_tides`	Model tide heights for every dataset pixel using multiple ocean tide models.
`tag_tides`	Model tide heights and phases for every dataset timestep using multiple ocean tide models.

pixel_tides ¶

pixel_tides(
    data,
    time=None,
    model="EOT20",
    directory=None,
    resample=True,
    calculate_quantiles=None,
    resolution=None,
    buffer=None,
    resample_method="bilinear",
    dask_chunks=None,
    dask_compute=True,
    **model_tides_kwargs
)

Model tide heights for every dataset pixel using multiple ocean tide models.

This function models tides into a low-resolution tide modelling grid covering the spatial extent of the input data (buffered to reduce potential edge effects). These modelled tides can then be resampled back into the original higher resolution dataset's extent and resolution to produce a modelled tide height for every pixel through time.

This function uses the parallelised model_tides function under the hood. It supports all tidal models supported by pyTMD, including:

Empirical Ocean Tide model (EOT20)
Finite Element Solution tide models (FES2022, FES2014, FES2012)
TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)
Technical University of Denmark tide models (DTU23)

This function requires access to tide model data files. For tide model setup instructions, refer to the guide: https://geoscienceaustralia.github.io/eo-tides/setup/

Parameters:

Name	Type	Description	Default
`data` ¶	`Dataset or DataArray or GeoBox`	A multi-dimensional dataset or GeoBox pixel grid that will be used to define the spatial tide modelling grid. If `data` is an xarray object, it should include a "time" dimension. If no "time" dimension exists or if `data` is a GeoBox, then times must be passed using the `time` parameter.	required
`time` ¶	`DatetimeLike`	By default, tides will be modelled using times from the "time" dimension of `data`. Alternatively, this param can be used to provide a custom set of times. Accepts any format that can be converted by `pandas.to_datetime()`. For example: `time=pd.date_range(start="2000", end="2001", freq="5h")`	`None`
`model` ¶	`str or list of str`	The tide model (or list of models) to use to model tides. If a list is provided, a new "tide_model" dimension will be added to the `xarray.DataArray` outputs. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported models, run `from eo_tides.utils import list_models; list_models()`.	`'EOT20'`
`directory` ¶	`str`	The directory containing tide model data files. If no path is provided, this will default to the environment variable `EO_TIDES_TIDE_MODELS` if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by `pyTMD` (https://geoscienceaustralia.github.io/eo-tides/setup/).	`None`
`resample` ¶	`bool`	Whether to resample low resolution tides back into `data`'s original higher resolution grid. Set this to `False` if you do not want low resolution tides to be re-projected back to higher resolution.	`True`
`calculate_quantiles` ¶	`tuple of float or numpy.ndarray`	Rather than returning all individual tides, low-resolution tides can be first aggregated using a quantile calculation by passing in a tuple or array of quantiles to compute. For example, this could be used to calculate the min/max tide across all times: `calculate_quantiles=(0.0, 1.0)`.	`None`
`resolution` ¶	`float`	The desired resolution of the low-resolution grid used for tide modelling. The default None will create a 5000 m resolution grid if `data` has a projected CRS (i.e. metre units), or a 0.05 degree resolution grid if `data` has a geographic CRS (e.g. degree units). Note: higher resolutions do not necessarily provide better tide modelling performance, as results will be limited by the resolution of the underlying global tide model (e.g. 1/16th degree / ~5 km resolution grid for FES2014).	`None`
`buffer` ¶	`float`	The amount by which to buffer the higher resolution grid extent when creating the new low resolution grid. This buffering ensures that modelled tides are seamless across analysis boundaries. This buffer is eventually be clipped away when the low-resolution modelled tides are re-projected back to the original resolution and extent of `data`. To ensure that at least two low-resolution grid pixels occur outside of the dataset bounds, the default None applies a 12000 m buffer if `data` has a projected CRS (i.e. metre units), or a 0.12 degree buffer if `data` has a geographic CRS (e.g. degree units).	`None`
`resample_method` ¶	`str`	If resampling is requested (see `resample` above), use this resampling method when resampling from low resolution to high resolution pixels. Defaults to "bilinear"; valid options include "nearest", "cubic", "min", "max", "average" etc.	`'bilinear'`
`dask_chunks` ¶	`tuple of float`	Can be used to configure custom Dask chunking for the final resampling step. By default, chunks will be automatically set to match y/x chunks from `data` if they exist; otherwise chunks will be chosen to cover the entire y/x extent of the dataset. For custom chunks, provide a tuple in the form `(y, x)`, e.g. `(2048, 2048)`.	`None`
`dask_compute` ¶	`bool`	Whether to compute results of the resampling step using Dask. If False, `tides_highres` will be returned as a Dask-enabled array.	`True`
`**model_tides_kwargs` ¶		Optional parameters passed to the `eo_tides.model.model_tides` function. Important parameters include `cutoff` (used to extrapolate modelled tides away from the coast; defaults to `np.inf`), `crop` (whether to crop tide model constituent files on-the-fly to improve performance) etc.	`{}`

Returns:

Name	Type	Description
`tides_da`	`DataArray`	A three-dimensional tide height array. If `resample=True` (default), a high-resolution array of tide heights will be returned that matches the exact spatial resolution and extents of `data`. This will contain either tide heights for every timestep in `data` (or in `times` if provided), or tide height quantiles for every quantile provided by `calculate_quantiles`. If `resample=False`, results for the intermediate low-resolution tide modelling grid will be returned instead.

Source code in eo_tides/eo.py

def pixel_tides(
    data: xr.Dataset | xr.DataArray | GeoBox,
    time: DatetimeLike | None = None,
    model: str | list[str] = "EOT20",
    directory: str | os.PathLike | None = None,
    resample: bool = True,
    calculate_quantiles: np.ndarray | tuple[float, float] | None = None,
    resolution: float | None = None,
    buffer: float | None = None,
    resample_method: str = "bilinear",
    dask_chunks: tuple[float, float] | None = None,
    dask_compute: bool = True,
    **model_tides_kwargs,
) -> xr.DataArray:
    """Model tide heights for every dataset pixel using multiple ocean tide models.

    This function models tides into a low-resolution tide
    modelling grid covering the spatial extent of the input
    data (buffered to reduce potential edge effects). These
    modelled tides can then be resampled back into the original
    higher resolution dataset's extent and resolution to
    produce a modelled tide height for every pixel through time.

    This function uses the parallelised `model_tides` function
    under the hood. It supports all tidal models supported by
    `pyTMD`, including:

    - Empirical Ocean Tide model (EOT20)
    - Finite Element Solution tide models (FES2022, FES2014, FES2012)
    - TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
    - Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
    - Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)
    - Technical University of Denmark tide models (DTU23)

    This function requires access to tide model data files.
    For tide model setup instructions, refer to the guide:
    https://geoscienceaustralia.github.io/eo-tides/setup/

    Parameters
    ----------
    data : xarray.Dataset or xarray.DataArray or odc.geo.geobox.GeoBox
        A multi-dimensional dataset or GeoBox pixel grid that will
        be used to define the spatial tide modelling grid. If `data`
        is an xarray object, it should include a "time" dimension.
        If no "time" dimension exists or if `data` is a GeoBox,
        then times must be passed using the `time` parameter.
    time : DatetimeLike, optional
        By default, tides will be modelled using times from the
        "time" dimension of `data`. Alternatively, this param can
        be used to provide a custom set of times. Accepts any format
        that can be converted by `pandas.to_datetime()`. For example:
        `time=pd.date_range(start="2000", end="2001", freq="5h")`
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        If a list is provided, a new "tide_model" dimension will be
        added to the `xarray.DataArray` outputs. Defaults to "EOT20";
        specify "all" to use all models available in `directory`.
        For a full list of available and supported models, run
        `from eo_tides.utils import list_models; list_models()`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    resample : bool, optional
        Whether to resample low resolution tides back into `data`'s original
        higher resolution grid. Set this to `False` if you do not want
        low resolution tides to be re-projected back to higher resolution.
    calculate_quantiles : tuple of float or numpy.ndarray, optional
        Rather than returning all individual tides, low-resolution tides
        can be first aggregated using a quantile calculation by passing in
        a tuple or array of quantiles to compute. For example, this could
        be used to calculate the min/max tide across all times:
        `calculate_quantiles=(0.0, 1.0)`.
    resolution : float, optional
        The desired resolution of the low-resolution grid used for tide
        modelling. The default None will create a 5000 m resolution grid
        if `data` has a projected CRS (i.e. metre units), or a 0.05 degree
        resolution grid if `data` has a geographic CRS (e.g. degree units).
        Note: higher resolutions do not necessarily provide better
        tide modelling performance, as results will be limited by the
        resolution of the underlying global tide model (e.g. 1/16th
        degree / ~5 km resolution grid for FES2014).
    buffer : float, optional
        The amount by which to buffer the higher resolution grid extent
        when creating the new low resolution grid. This buffering
        ensures that modelled tides are seamless across analysis
        boundaries. This buffer is eventually be clipped away when
        the low-resolution modelled tides are re-projected back to the
        original resolution and extent of `data`. To ensure that at least
        two low-resolution grid pixels occur outside of the dataset
        bounds, the default None applies a 12000 m buffer if `data` has a
        projected CRS (i.e. metre units), or a 0.12 degree buffer if
        `data` has a geographic CRS (e.g. degree units).
    resample_method : str, optional
        If resampling is requested (see `resample` above), use this
        resampling method when resampling from low resolution to high
        resolution pixels. Defaults to "bilinear"; valid options include
        "nearest", "cubic", "min", "max", "average" etc.
    dask_chunks : tuple of float, optional
        Can be used to configure custom Dask chunking for the final
        resampling step. By default, chunks will be automatically set
        to match y/x chunks from `data` if they exist; otherwise chunks
        will be chosen to cover the entire y/x extent of the dataset.
        For custom chunks, provide a tuple in the form `(y, x)`, e.g.
        `(2048, 2048)`.
    dask_compute : bool, optional
        Whether to compute results of the resampling step using Dask.
        If False, `tides_highres` will be returned as a Dask-enabled array.
    **model_tides_kwargs :
        Optional parameters passed to the `eo_tides.model.model_tides`
        function. Important parameters include `cutoff` (used to
        extrapolate modelled tides away from the coast; defaults to
        `np.inf`), `crop` (whether to crop tide model constituent files
        on-the-fly to improve performance) etc.

    Returns
    -------
    tides_da : xr.DataArray
        A three-dimensional tide height array.
        If `resample=True` (default), a high-resolution array of tide
        heights will be returned that matches the exact spatial resolution
        and extents of `data`. This will contain either tide heights for
        every timestep in `data` (or in `times` if provided), or tide height
        quantiles for every quantile provided by `calculate_quantiles`.
        If `resample=False`, results for the intermediate low-resolution
        tide modelling grid will be returned instead.

    """
    # Standardise data inputs, time and models
    gbox, time_coords = _standardise_inputs(data, time)
    dask_chunks = _resample_chunks(data, dask_chunks)
    model = [model] if isinstance(model, str) else model

    # Determine spatial dimensions
    y_dim, x_dim = gbox.dimensions

    # Determine resolution and buffer, using different defaults for
    # geographic (i.e. degrees) and projected (i.e. metres) CRSs:
    assert gbox.crs is not None  # noqa: S101
    crs_units = gbox.crs.units[0][0:6]
    if gbox.crs.geographic:
        if resolution is None:
            resolution = 0.05
        elif resolution > 360:
            err_msg = (
                f"A resolution of greater than 360 was "
                f"provided, but `data` has a geographic CRS "
                f"in {crs_units} units. Did you accidently "
                f"provide a resolution in projected "
                f"(i.e. metre) units?",
            )
            raise ValueError(err_msg)
        if buffer is None:
            buffer = 0.12
    else:
        if resolution is None:
            resolution = 5000
        elif resolution < 1:
            err_msg = (
                f"A resolution of less than 1 was provided, "
                f"but `data` has a projected CRS in "
                f"{crs_units} units. Did you accidently "
                f"provide a resolution in geographic "
                f"(degree) units?",
            )
            raise ValueError(err_msg)
        if buffer is None:
            buffer = 12000

    # Raise error if resolution is less than dataset resolution
    dataset_res = gbox.resolution.x
    if resolution < dataset_res:
        err_msg = (
            f"The resolution of the low-resolution tide "
            f"modelling grid ({resolution:.2f}) is less "
            f"than `data`'s pixel resolution ({dataset_res:.2f}). "
            f"This can cause extremely slow tide modelling "
            f"performance. Please select provide a resolution "
            f"greater than {dataset_res:.2f} using "
            f"`pixel_tides`'s 'resolution' parameter.",
        )
        raise ValueError(err_msg)

    # Create a new reduced resolution tide modelling grid after
    # first buffering the grid
    print(f"Creating reduced resolution {resolution} x {resolution} {crs_units} tide modelling array")
    buffered_geobox = gbox.buffered(buffer)
    rescaled_geobox = GeoBox.from_bbox(bbox=buffered_geobox.boundingbox, resolution=resolution)
    rescaled_ds = odc.geo.xr.xr_zeros(rescaled_geobox)

    # Flatten grid to 1D, then add time dimension
    flattened_ds = rescaled_ds.stack(z=(x_dim, y_dim))  # noqa: PD013
    flattened_ds = flattened_ds.expand_dims(dim={"time": time_coords})

    # Model tides in parallel, returning a pandas.DataFrame
    tide_df = model_tides(
        x=flattened_ds[x_dim],
        y=flattened_ds[y_dim],
        time=flattened_ds.time,
        crs=f"EPSG:{gbox.crs.epsg}",
        model=model,
        directory=directory,
        **model_tides_kwargs,
    )

    # Convert our pandas.DataFrame tide modelling outputs to xarray
    tides_lowres = (
        # Rename x and y dataframe indexes to match x and y xarray dims
        tide_df.rename_axis(["time", x_dim, y_dim])
        # Add tide model column to dataframe indexes so we can convert
        # our dataframe to a multidimensional xarray
        .set_index("tide_model", append=True)
        # Convert to xarray and select our tide modelling xr.DataArray
        .to_xarray()
        .tide_height
        # Re-index and transpose into our input coordinates and dim order
        .reindex_like(rescaled_ds)
        .transpose("tide_model", "time", y_dim, x_dim)
    )

    # Optionally calculate and return quantiles rather than raw data.
    # Set dtype to dtype of the input data as quantile always returns
    # float64 (memory intensive)
    if calculate_quantiles is not None:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            print("Computing tide quantiles")
            tides_lowres = tides_lowres.quantile(q=calculate_quantiles, dim="time").astype(tides_lowres.dtype)

    # If only one tidal model exists, squeeze out "tide_model" dim
    if len(tides_lowres.tide_model) == 1:
        tides_lowres = tides_lowres.squeeze("tide_model")

    # Ensure CRS is present before we apply any resampling
    tides_lowres = tides_lowres.odc.assign_crs(gbox.crs)

    # Reproject into original high resolution grid
    if resample:
        print("Reprojecting tides into original resolution")
        return _pixel_tides_resample(
            tides_lowres,
            gbox,
            resample_method,
            dask_chunks,
            dask_compute,
        )

    print("Returning low resolution tide array")
    return tides_lowres

tag_tides ¶

tag_tides(
    data,
    time=None,
    model="EOT20",
    directory=None,
    tidepost_lat=None,
    tidepost_lon=None,
    return_phases=False,
    **model_tides_kwargs
)

Model tide heights and phases for every dataset timestep using multiple ocean tide models.

Tides are modelled using the centroid of the dataset by default; use tidepost_lat and tidepost_lon to specify a custom tidal modelling location.

The function supports all tidal models supported by pyTMD, including:

Empirical Ocean Tide model (EOT20)
Finite Element Solution tide models (FES2022, FES2014, FES2012)
TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)
Technical University of Denmark tide models (DTU23)

This function requires access to tide model data files. For tide model setup instructions, refer to the guide: https://geoscienceaustralia.github.io/eo-tides/setup/

Parameters:

Name	Type	Description	Default
`data` ¶	`Dataset or DataArray or GeoBox`	A multi-dimensional dataset or GeoBox pixel grid that will be used to define the tide modelling location. If `data` is an xarray object, it should include a "time" dimension. If no "time" dimension exists or if `data` is a GeoBox, then times must be passed using the `time` parameter.	required
`time` ¶	`DatetimeLike`	By default, tides will be modelled using times from the "time" dimension of `data`. Alternatively, this param can be used to provide a custom set of times. Accepts any format that can be converted by `pandas.to_datetime()`. For example: `time=pd.date_range(start="2000", end="2001", freq="5h")`	`None`
`model` ¶	`str or list of str`	The tide model (or list of models) to use to model tides. If a list is provided, a new "tide_model" dimension will be added to the `xarray.DataArray` outputs. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported models, run `from eo_tides.utils import list_models; list_models()`.	`'EOT20'`
`directory` ¶	`str`	The directory containing tide model data files. If no path is provided, this will default to the environment variable `EO_TIDES_TIDE_MODELS` if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by `pyTMD` (https://geoscienceaustralia.github.io/eo-tides/setup/).	`None`
`tidepost_lat` ¶	`float`	Optional coordinates used to model tides. The default is None, which uses the centroid of the dataset as the tide modelling location.	`None`
`tidepost_lon` ¶	`float`	Optional coordinates used to model tides. The default is None, which uses the centroid of the dataset as the tide modelling location.	`None`
`return_phases` ¶	`bool`	Whether to model and return tide phases in addition to tide heights. If True, outputs will be returned as an xr.Dataset containing both "tide_height" and "tide_phase" variables.	`False`
`**model_tides_kwargs` ¶		Optional parameters passed to the `eo_tides.model.model_tides` function. Important parameters include `cutoff` (used to extrapolate modelled tides away from the coast; defaults to `np.inf`), `crop` (whether to crop tide model constituent files on-the-fly to improve performance) etc.	`{}`

Returns:

Name	Type	Description
`tides_da`	`DataArray or Dataset`	If `return_phases=False`: a one-dimensional "tide_height" xr.DataArray. If `return_phases=True`: a one-dimensional xr.Dataset containing "tide_height" and "tide_phase" variables. Outputs will contain values for every timestep in `data`, or for every time in `times` if provided.

Source code in eo_tides/eo.py

def tag_tides(
    data: xr.Dataset | xr.DataArray | GeoBox,
    time: DatetimeLike | None = None,
    model: str | list[str] = "EOT20",
    directory: str | os.PathLike | None = None,
    tidepost_lat: float | None = None,
    tidepost_lon: float | None = None,
    return_phases: bool = False,
    **model_tides_kwargs,
) -> xr.DataArray | xr.Dataset:
    """Model tide heights and phases for every dataset timestep using multiple ocean tide models.

    Tides are modelled using the centroid of the dataset by
    default; use `tidepost_lat` and `tidepost_lon` to specify
    a custom tidal modelling location.

    The function supports all tidal models supported by `pyTMD`,
    including:

    - Empirical Ocean Tide model (EOT20)
    - Finite Element Solution tide models (FES2022, FES2014, FES2012)
    - TOPEX/POSEIDON global tide models (TPXO10, TPXO9, TPXO8)
    - Global Ocean Tide models (GOT5.6, GOT5.5, GOT4.10, GOT4.8, GOT4.7)
    - Hamburg direct data Assimilation Methods for Tides models (HAMTIDE11)
    - Technical University of Denmark tide models (DTU23)

    This function requires access to tide model data files.
    For tide model setup instructions, refer to the guide:
    https://geoscienceaustralia.github.io/eo-tides/setup/

    Parameters
    ----------
    data : xarray.Dataset or xarray.DataArray or odc.geo.geobox.GeoBox
        A multi-dimensional dataset or GeoBox pixel grid that will
        be used to define the tide modelling location. If `data`
        is an xarray object, it should include a "time" dimension.
        If no "time" dimension exists or if `data` is a GeoBox,
        then times must be passed using the `time` parameter.
    time : DatetimeLike, optional
        By default, tides will be modelled using times from the
        "time" dimension of `data`. Alternatively, this param can
        be used to provide a custom set of times. Accepts any format
        that can be converted by `pandas.to_datetime()`. For example:
        `time=pd.date_range(start="2000", end="2001", freq="5h")`
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        If a list is provided, a new "tide_model" dimension will be
        added to the `xarray.DataArray` outputs. Defaults to "EOT20";
        specify "all" to use all models available in `directory`.
        For a full list of available and supported models, run
        `from eo_tides.utils import list_models; list_models()`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    tidepost_lat, tidepost_lon : float, optional
        Optional coordinates used to model tides. The default is None,
        which uses the centroid of the dataset as the tide modelling
        location.
    return_phases : bool, optional
        Whether to model and return tide phases in addition to tide heights.
        If True, outputs will be returned as an xr.Dataset containing both
        "tide_height" and "tide_phase" variables.
    **model_tides_kwargs :
        Optional parameters passed to the `eo_tides.model.model_tides`
        function. Important parameters include `cutoff` (used to
        extrapolate modelled tides away from the coast; defaults to
        `np.inf`), `crop` (whether to crop tide model constituent files
        on-the-fly to improve performance) etc.

    Returns
    -------
    tides_da : xr.DataArray or xr.Dataset
        If `return_phases=False`: a one-dimensional "tide_height" xr.DataArray.
        If `return_phases=True`: a one-dimensional xr.Dataset containing
        "tide_height" and "tide_phase" variables.
        Outputs will contain values for every timestep in `data`, or for
        every time in `times` if provided.

    """
    # Standardise data inputs, time and models
    gbox, time_coords = _standardise_inputs(data, time)
    model = [model] if isinstance(model, str) else model

    # If custom tide posts are not provided, use dataset centroid
    if tidepost_lat is None or tidepost_lon is None:
        lon, lat = gbox.geographic_extent.centroid.coords[0]
        print(f"Setting tide modelling location from dataset centroid: {lon:.2f}, {lat:.2f}")
    else:
        lon, lat = tidepost_lon, tidepost_lat
        print(f"Using tide modelling location: {lon:.2f}, {lat:.2f}")

    # Either model both tides and phases, or model only tides
    if return_phases:
        # Model tide phases and heights for each observation
        tide_df = model_phases(
            x=lon,
            y=lat,
            time=time_coords,
            model=model,
            directory=directory,
            crs="EPSG:4326",
            return_tides=True,
            **model_tides_kwargs,
        )

    else:
        # Model tide heights for each observation
        tide_df = model_tides(
            x=lon,
            y=lat,
            time=time_coords,
            model=model,
            directory=directory,
            crs="EPSG:4326",
            **model_tides_kwargs,
        )

    # If tides cannot be successfully modeled (e.g. if the centre of the
    # xarray dataset is located is over land), raise an exception
    if tide_df.tide_height.isna().all():
        err_msg = (
            f"Tides could not be modelled for dataset centroid located "
            f"at {tidepost_lon:.2f}, {tidepost_lat:.2f}. This can occur if "
            f"this coordinate occurs over land. Please manually specify "
            f"a tide modelling location located over water using the "
            f"`tidepost_lat` and `tidepost_lon` parameters.",
        )
        raise ValueError(err_msg)

    # Convert to xarray format, squeezing to return an xr.DataArray if
    # dataframe contains only one "tide_height" column
    tides_da = tide_df.reset_index().set_index(["time", "tide_model"]).drop(["x", "y"], axis=1).squeeze().to_xarray()

    # If only one tidal model exists, squeeze out "tide_model" dim
    if len(tides_da.tide_model) == 1:
        tides_da = tides_da.squeeze("tide_model")

    return tides_da

eo_tides.stats ¶

Tools for analysing local tide dynamics and satellite biases.

This module provides functions to assess how well satellite EO data captures real-world tides, and reveals potential tide biases in satellite EO data coverage.

Functions:

Name	Description
`pixel_stats`	Generate tide statistics and satellite tide bias metrics for every dataset pixel.
`tide_aliasing`	Calculate aliasing periods for tidal constituents given satellite revisit intervals.
`tide_stats`	Generate tide statistics and satellite tide bias metrics for every dataset timestep.

pixel_stats ¶

pixel_stats(
    data,
    time=None,
    model="EOT20",
    directory=None,
    resample=True,
    modelled_freq="3h",
    min_max_q=(0.0, 1.0),
    resample_method="bilinear",
    dask_chunks=None,
    dask_compute=True,
    extrapolate=True,
    cutoff=10,
    **pixel_tides_kwargs
)

Generate tide statistics and satellite tide bias metrics for every dataset pixel.

Takes a multi-dimensional dataset and generate pixel-level tide statistics and satellite-observed tide bias metrics, calculated based on every timestep in the satellite data and modelled into the spatial extent of the imagery.

By comparing the subset of tides observed by satellites against the full astronomical tidal range, we can evaluate whether the tides observed by satellites are biased (e.g. fail to observe either the highest or lowest tides) due to tide aliasing interactions with sun-synchronous satellite overpasses.

Compared to tide_stats, this function models tide metrics spatially to produce a two-dimensional output for each statistic.

For more information about the tidal statistics computed by this function, refer to Figure 8 in Bishop-Taylor et al. 2018: https://www.sciencedirect.com/science/article/pii/S0272771418308783#fig8

Parameters:

Name	Type	Description	Default
`data` ¶	`Dataset or DataArray or GeoBox`	A multi-dimensional dataset or GeoBox pixel grid that will be used to calculate spatial tide statistics. If `data` is an xarray object, it should include a "time" dimension. If no "time" dimension exists or if `data` is a GeoBox, then times must be passed using the `time` parameter.	required
`time` ¶	`DatetimeLike`	By default, tides will be modelled using times from the "time" dimension of `data`. Alternatively, this param can be used to provide a custom set of times. Accepts any format that can be converted by `pandas.to_datetime()`. For example: `time=pd.date_range(start="2000", end="2001", freq="5h")`	`None`
`model` ¶	`str or list of str`	The tide model (or list of models) to use to model tides. If a list is provided, a new "tide_model" dimension will be added to the `xarray.Dataset` output. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported models, run `from eo_tides.utils import list_models; list_models()`.	`'EOT20'`
`directory` ¶	`str`	The directory containing tide model data files. If no path is provided, this will default to the environment variable `EO_TIDES_TIDE_MODELS` if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by `pyTMD` (https://geoscienceaustralia.github.io/eo-tides/setup/).	`None`
`resample` ¶	`bool`	Whether to resample tide statistics back into `data`'s original higher resolution grid. Set this to `False` if you want to return lower-resolution tide statistics (which can be useful for assessing tide biases across large spatial extents).	`True`
`modelled_freq` ¶	`str`	An optional string giving the frequency at which to model tides when computing the full modelled tidal range. Defaults to '3h', which computes a tide height for every three hours across the temporal extent of `data`.	`'3h'`
`min_max_q` ¶	`tuple`	Quantiles used to calculate max and min observed and modelled astronomical tides. By default `(0.0, 1.0)` which is equivalent to minimum and maximum; for a softer threshold that is more robust to outliers use e.g. `(0.1, 0.9)`.	`(0.0, 1.0)`
`resample_method` ¶	`str`	If resampling is requested (see `resample` above), use this resampling method when resampling from low resolution to high resolution pixels. Defaults to "bilinear"; valid options include "nearest", "cubic", "min", "max", "average" etc.	`'bilinear'`
`dask_chunks` ¶	`tuple of float`	Can be used to configure custom Dask chunking for the final resampling step. By default, chunks will be automatically set to match y/x chunks from `data` if they exist; otherwise chunks will be chosen to cover the entire y/x extent of the dataset. For custom chunks, provide a tuple in the form `(y, x)`, e.g. `(2048, 2048)`.	`None`
`dask_compute` ¶	`bool`	Whether to compute results of the resampling step using Dask. If False, `stats_ds` will be returned as a Dask-enabled array.	`True`
`extrapolate` ¶	`bool`	Whether to extrapolate tides into x and y coordinates outside of the valid tide modelling domain using nearest-neighbor. Defaults to True.	`True`
`cutoff` ¶	`float`	Extrapolation cutoff in kilometers. To avoid producing tide statistics too far inland, the default is 10 km.	`10`
`**pixel_tides_kwargs` ¶		Optional parameters passed to the `eo_tides.eo.pixel_tides` function.	`{}`

Returns:

Name	Type	Description
`stats_ds`	`Dataset`	An `xarray.Dataset` containing the following statistics as two-dimensional data variables: `mot`: mean tide height observed by the satellite (metres) `mat`: mean modelled astronomical tide height (metres) `lot`: minimum tide height observed by the satellite (metres) `lat`: minimum tide height from modelled astronomical tidal range (metres) `hot`: maximum tide height observed by the satellite (metres) `hat`: maximum tide height from modelled astronomical tidal range (metres) `otr`: tidal range observed by the satellite (metres) `tr`: modelled astronomical tide range (metres) `spread`: proportion of the full modelled tidal range observed by the satellite `offset_low`: proportion of the lowest tides never observed by the satellite `offset_high`: proportion of the highest tides never observed by the satellite

Source code in eo_tides/stats.py

def pixel_stats(
    data: xr.Dataset | xr.DataArray | GeoBox,
    time: DatetimeLike | None = None,
    model: str | list[str] = "EOT20",
    directory: str | os.PathLike | None = None,
    resample: bool = True,
    modelled_freq: str = "3h",
    min_max_q: tuple[float, float] = (0.0, 1.0),
    resample_method: str = "bilinear",
    dask_chunks: tuple[float, float] | None = None,
    dask_compute: bool = True,
    extrapolate: bool = True,
    cutoff: float = 10,
    **pixel_tides_kwargs,
) -> xr.Dataset:
    """Generate tide statistics and satellite tide bias metrics for every dataset pixel.

    Takes a multi-dimensional dataset and generate pixel-level
    tide statistics and satellite-observed tide bias metrics,
    calculated based on every timestep in the satellite data and
    modelled into the spatial extent of the imagery.

    By comparing the subset of tides observed by satellites
    against the full astronomical tidal range, we can evaluate
    whether the tides observed by satellites are biased
    (e.g. fail to observe either the highest or lowest tides)
    due to tide aliasing interactions with sun-synchronous satellite
    overpasses.

    Compared to `tide_stats`, this function models tide metrics
    spatially to produce a two-dimensional output for each statistic.

    For more information about the tidal statistics computed by this
    function, refer to Figure 8 in Bishop-Taylor et al. 2018:
    <https://www.sciencedirect.com/science/article/pii/S0272771418308783#fig8>

    Parameters
    ----------
    data : xarray.Dataset or xarray.DataArray or odc.geo.geobox.GeoBox
        A multi-dimensional dataset or GeoBox pixel grid that will
        be used to calculate spatial tide statistics. If `data`
        is an xarray object, it should include a "time" dimension.
        If no "time" dimension exists or if `data` is a GeoBox,
        then times must be passed using the `time` parameter.
    time : DatetimeLike, optional
        By default, tides will be modelled using times from the
        "time" dimension of `data`. Alternatively, this param can
        be used to provide a custom set of times. Accepts any format
        that can be converted by `pandas.to_datetime()`. For example:
        `time=pd.date_range(start="2000", end="2001", freq="5h")`
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        If a list is provided, a new "tide_model" dimension will be
        added to the `xarray.Dataset` output. Defaults to "EOT20";
        specify "all" to use all models available in `directory`.
        For a full list of available and supported models, run
        `from eo_tides.utils import list_models; list_models()`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    resample : bool, optional
        Whether to resample tide statistics back into `data`'s original
        higher resolution grid. Set this to `False` if you want to return
        lower-resolution tide statistics (which can be useful for
        assessing tide biases across large spatial extents).
    modelled_freq : str, optional
        An optional string giving the frequency at which to model tides
        when computing the full modelled tidal range. Defaults to '3h',
        which computes a tide height for every three hours across the
        temporal extent of `data`.
    min_max_q : tuple, optional
        Quantiles used to calculate max and min observed and modelled
        astronomical tides. By default `(0.0, 1.0)` which is equivalent
        to minimum and maximum; for a softer threshold that is more
        robust to outliers use e.g. `(0.1, 0.9)`.
    resample_method : str, optional
        If resampling is requested (see `resample` above), use this
        resampling method when resampling from low resolution to high
        resolution pixels. Defaults to "bilinear"; valid options include
        "nearest", "cubic", "min", "max", "average" etc.
    dask_chunks : tuple of float, optional
        Can be used to configure custom Dask chunking for the final
        resampling step. By default, chunks will be automatically set
        to match y/x chunks from `data` if they exist; otherwise chunks
        will be chosen to cover the entire y/x extent of the dataset.
        For custom chunks, provide a tuple in the form `(y, x)`, e.g.
        `(2048, 2048)`.
    dask_compute : bool, optional
        Whether to compute results of the resampling step using Dask.
        If False, `stats_ds` will be returned as a Dask-enabled array.
    extrapolate : bool, optional
        Whether to extrapolate tides into x and y coordinates outside of
        the valid tide modelling domain using nearest-neighbor. Defaults
        to True.
    cutoff : float, optional
        Extrapolation cutoff in kilometers. To avoid producing tide
        statistics too far inland, the default is 10 km.
    **pixel_tides_kwargs :
        Optional parameters passed to the `eo_tides.eo.pixel_tides`
        function.

    Returns
    -------
    stats_ds : xarray.Dataset
        An `xarray.Dataset` containing the following statistics as two-dimensional data variables:

        - `mot`: mean tide height observed by the satellite (metres)
        - `mat`: mean modelled astronomical tide height (metres)
        - `lot`: minimum tide height observed by the satellite (metres)
        - `lat`: minimum tide height from modelled astronomical tidal range (metres)
        - `hot`: maximum tide height observed by the satellite (metres)
        - `hat`: maximum tide height from modelled astronomical tidal range (metres)
        - `otr`: tidal range observed by the satellite (metres)
        - `tr`: modelled astronomical tide range (metres)
        - `spread`: proportion of the full modelled tidal range observed by the satellite
        - `offset_low`: proportion of the lowest tides never observed by the satellite
        - `offset_high`: proportion of the highest tides never observed by the satellite

    """
    # Standardise data inputs, time and models
    gbox, obs_times = _standardise_inputs(data, time)
    dask_chunks = _resample_chunks(data, dask_chunks)
    model = [model] if isinstance(model, str) else model

    # Generate range of times covering entire period of satellite record
    assert obs_times is not None  # noqa: S101
    all_times = pd.date_range(
        start=obs_times.min().item(),
        end=obs_times.max().item(),
        freq=modelled_freq,
    )

    # Model tides for observed timesteps
    obs_tides_da = pixel_tides(
        gbox,
        time=obs_times,
        model=model,
        directory=directory,
        resample=False,
        extrapolate=extrapolate,
        cutoff=cutoff,
        **pixel_tides_kwargs,
    )

    # Model tides for all modelled timesteps
    all_tides_da = pixel_tides(
        gbox,
        time=all_times,
        model=model,
        directory=directory,
        resample=False,
        extrapolate=extrapolate,
        cutoff=cutoff,
        **pixel_tides_kwargs,
    )

    # Calculate statistics
    stats_lowres = _tide_statistics(obs_tides_da, all_tides_da, min_max_q=min_max_q)

    # Assign CRS and geobox to allow reprojection
    stats_lowres = stats_lowres.odc.assign_crs(crs=gbox.crs)

    # Reproject statistics into original high resolution grid
    if resample:
        print("Reprojecting statistics into original resolution")
        return _pixel_tides_resample(
            stats_lowres,
            gbox,
            resample_method,
            dask_chunks,
            dask_compute,
            None,
        )

    print("Returning low resolution statistics array")
    return stats_lowres

tide_aliasing ¶

tide_aliasing(
    satellites,
    c=None,
    units="days",
    max_inf=1000000,
    style=True,
)

Calculate aliasing periods for tidal constituents given satellite revisit intervals.

This function uses pyTMD.arguments.aliasing_period to calculate the aliasing periods between satellite overpass periods and the natural cycles of tidal constituents. The aliasing period describes how long it would take for a satellite to sample the entire tidal cycle for each constituent, based on the satellite's observation frequency.

Short aliasing periods mean the satellite will observe the full range of tidal variation relatively quickly, reducing the risk of tide-related bias. Long aliasing periods indicate that it will take much longer to sample all tidal conditions, increasing the risk that satellite analyses may misrepresent tidal dynamics.

Revisit periods are approximate and based on nominal repeat cycles at the equator. Actual observation frequency may vary due to latitude, cloud cover, sensor availability, and acquisition planning. Custom revisit intervals can be defined by passing a dictionary to satellites, e.g. satellites={"custom-sat": 5}.

For more information, refer to https://pytmd.readthedocs.io/en/latest/api_reference/arguments.html#pyTMD.arguments.aliasing_period

Parameters:

Name	Type	Description	Default
`satellites` ¶	`list of str or dict`	List of satellite names to analyse, or a custom dictionary with satellite names as keys and revisit frequency in days as values. Supported satellites include: Landsat (optical): Two satellites combined: "landsat" Individual: "landsat-5", "landsat-7", "landsat-8", "landsat-9" Sentinel-2 (optical): Two satellites combined: "sentinel-2" Individual: "sentinel-2a", "sentinel-2b", "sentinel-2c" Sentinel-1 (C-band SAR): Two satellites combined: "sentinel-1" Individual: "sentinel-1a", "sentinel-1b", "sentinel-1c" Sentinel-3: OLCI (optical): "sentinel-3a-olci", "sentinel-3b-olci", "sentinel-3c-olci" SLSTR (thermal): "sentinel-3a-slstr", "sentinel-3b-slstr, "sentinel-3c-slstr" SRAL (altimetry): "sentinel-3a-sral", "sentinel-3b-sral, "sentinel-3c-sral" SWOT (KaRIn swath altimetry): "swot" NISAR (L- and S-band SAR): "nisar"	required
`c` ¶	`list of str or None`	List of tidal constituents to include. If None, all known constituents are included. Constituent names should be lowercase (e.g., "m2", "k1").	`None`
`units` ¶	`str`	Output time units for the aliasing periods. Must be one of: "years", "days", "hours", or "minutes". Default is "days".	`'days'`
`max_inf` ¶	`float`	Maximum aliasing period to display. Values exceeding this threshold are replaced with `np.inf`. Default is 1,000,000.	`1000000`
`style` ¶	`bool`	If True, returns a styled `pandas.DataFrame`. If False, returns a raw DataFrame. Default is True.	`True`

Returns:

Type	Description
`DataFrame`	A table showing aliasing periods for each tidal constituent across the given satellites. The result is styled with a color gradient if `style=True`, or returned as a plain DataFrame.

Examples:

>>> eo_tide_aliasing(["sentinel-2", "landsat-8"])
>>> eo_tide_aliasing(["swot"], c=["m2", "k1"], units="hours", style=False)
>>> eo_tide_aliasing({"custom-sat": 5})

Source code in eo_tides/stats.py

def tide_aliasing(
    satellites: list[str] | dict[str, float],
    c: list[str] | None = None,
    units: str = "days",
    max_inf: float = 1_000_000,
    style: bool = True,
):
    """Calculate aliasing periods for tidal constituents given satellite revisit intervals.

    This function uses `pyTMD.arguments.aliasing_period` to calculate the
    aliasing periods between satellite overpass periods and the natural
    cycles of tidal constituents. The aliasing period describes how long
    it would take for a satellite to sample the entire tidal cycle for
    each constituent, based on the satellite's observation frequency.

    Short aliasing periods mean the satellite will observe the full range
    of tidal variation relatively quickly, reducing the risk of tide-related
    bias. Long aliasing periods indicate that it will take much longer to
    sample all tidal conditions, increasing the risk that satellite analyses
    may misrepresent tidal dynamics.

    Revisit periods are approximate and based on nominal repeat cycles at the equator.
    Actual observation frequency may vary due to latitude, cloud cover, sensor
    availability, and acquisition planning. Custom revisit intervals can be
    defined by passing a dictionary to `satellites`, e.g. `satellites={"custom-sat": 5}`.

    For more information, refer to https://pytmd.readthedocs.io/en/latest/api_reference/arguments.html#pyTMD.arguments.aliasing_period

    Parameters
    ----------
    satellites : list of str or dict
        List of satellite names to analyse, or a custom dictionary with
        satellite names as keys and revisit frequency in days as values.
        Supported satellites include:

        - Landsat (optical):
            - Two satellites combined: "landsat"
            - Individual: "landsat-5", "landsat-7", "landsat-8", "landsat-9"
        - Sentinel-2 (optical):
            - Two satellites combined: "sentinel-2"
            - Individual: "sentinel-2a", "sentinel-2b", "sentinel-2c"
        - Sentinel-1 (C-band SAR):
            - Two satellites combined: "sentinel-1"
            - Individual: "sentinel-1a", "sentinel-1b", "sentinel-1c"
        - Sentinel-3:
            - OLCI (optical): "sentinel-3a-olci", "sentinel-3b-olci", "sentinel-3c-olci"
            - SLSTR (thermal): "sentinel-3a-slstr", "sentinel-3b-slstr, "sentinel-3c-slstr"
            - SRAL (altimetry): "sentinel-3a-sral", "sentinel-3b-sral, "sentinel-3c-sral"
        - SWOT (KaRIn swath altimetry): "swot"
        - NISAR (L- and S-band SAR): "nisar"
    c : list of str or None, optional
        List of tidal constituents to include. If None, all known constituents
        are included. Constituent names should be lowercase (e.g., "m2", "k1").
    units : str, optional
        Output time units for the aliasing periods. Must be one of:
        "years", "days", "hours", or "minutes". Default is "days".
    max_inf : float, optional
        Maximum aliasing period to display. Values exceeding this threshold
        are replaced with `np.inf`. Default is 1,000,000.
    style : bool, optional
        If True, returns a styled `pandas.DataFrame`. If False, returns a raw
        DataFrame. Default is True.

    Returns
    -------
    pandas.DataFrame
        A table showing aliasing periods for each tidal constituent across the given satellites.
        The result is styled with a color gradient if `style=True`, or returned as a plain DataFrame.

    Examples
    --------
    >>> eo_tide_aliasing(["sentinel-2", "landsat-8"])
    >>> eo_tide_aliasing(["swot"], c=["m2", "k1"], units="hours", style=False)
    >>> eo_tide_aliasing({"custom-sat": 5})

    """
    # If satellites is a dict
    if isinstance(satellites, dict):
        revisit_dict = satellites | REVISIT_DICT
        satellites = list(satellites.keys())
    else:
        revisit_dict = REVISIT_DICT

    # Validate satellite names
    invalid_sats = set(satellites) - set(revisit_dict)
    if invalid_sats:
        valid = ", ".join(sorted(revisit_dict))
        error_msg = f"Unknown satellite(s): {', '.join(invalid_sats)}. Must be one of: {valid}"
        raise ValueError(error_msg)

    # Time unit factors
    unit_factors = {
        "seconds": 1,
        "minutes": 60,
        "hours": 3600,
        "days": 86400,
        "years": 365.25 * 86400,
    }

    # Use default list of constituents if none provided
    if c is None:
        c = list(C_NAMES.keys())

    # Unpack all parameters (amplitude, phase, omega, alpha, species)
    params = [_constituent_parameters(c_i) for c_i in c]
    _, _, omega, _, _ = np.array(params).T

    # Convert frequency in radians per second to period in seconds
    period = 2 * np.pi / omega

    # Compute aliasing period for each satellite
    with warnings.catch_warnings():
        warnings.filterwarnings(
            "ignore",
            category=RuntimeWarning,
            message="divide by zero encountered in divide",
        )
        aliasing_periods = {}
        for sat in satellites:
            revisit = revisit_dict[sat]
            print(f"Using {revisit} day revisit for {sat}")
            aliasing_periods[("aliasing_period", sat)] = aliasing_period(
                c,
                unit_factors["days"] * revisit,
            )

    # Combine into a dataframe
    alias_df = pd.DataFrame(
        index=c,
        data={
            ("period", ""): period,
            **aliasing_periods,
        },
    )

    # Raise error if unit is not supported
    if units not in unit_factors:
        error_msg = f"Unit not supported: {units}; must be one of 'years', 'days', 'hours', 'minutes'"
        raise ValueError(error_msg)

    # Rescale
    alias_df = (alias_df / unit_factors[units]).round(3)

    # Max value to set to inf:
    alias_df[alias_df > max_inf] = np.inf

    # Add additional columns
    alias_df.insert(0, column="name", value=[C_NAMES[c_i] for c_i in alias_df.index])
    alias_df.insert(1, column="type", value=[C_TYPE[c_i] for c_i in alias_df.index])

    # Style and return
    df_subset = alias_df.loc[:, "aliasing_period"]
    max_col = np.nanquantile(df_subset[np.isfinite(df_subset)].values, 0.9)

    if style:
        return alias_df.style.background_gradient(
            axis=None,
            vmin=0,
            vmax=max_col * 1.5,
            cmap="YlOrRd",
        ).format(precision=3)
    return alias_df

tide_stats ¶

tide_stats(
    data,
    time=None,
    model="EOT20",
    directory=None,
    tidepost_lat=None,
    tidepost_lon=None,
    plain_english=True,
    plot=True,
    plot_var=None,
    point_col=None,
    modelled_freq="3h",
    min_max_q=(0.0, 1.0),
    round_stats=3,
    **tag_tides_kwargs
)

Generate tide statistics and satellite tide bias metrics for every dataset timestep.

Takes a multi-dimensional dataset and generate tide statistics and satellite-observed tide bias metrics, calculated based on every timestep in the satellite data and the geographic centroid of the imagery.

By comparing the subset of tides observed by satellites against the full astronomical tidal range, we can evaluate whether the tides observed by satellites are biased (e.g. fail to observe either the highest or lowest tides) due to tide aliasing interactions with sun-synchronous satellite overpasses.

For more information about the tidal statistics computed by this function, refer to Figure 8 in Bishop-Taylor et al. 2018: https://www.sciencedirect.com/science/article/pii/S0272771418308783#fig8

Parameters:

Name	Type	Description	Default
`data` ¶	`Dataset or DataArray or GeoBox`	A multi-dimensional dataset or GeoBox pixel grid that will be used to calculate tide statistics. If `data` is an xarray object, it should include a "time" dimension. If no "time" dimension exists or if `data` is a GeoBox, then times must be passed using the `time` parameter.	required
`time` ¶	`DatetimeLike`	By default, tides will be modelled using times from the "time" dimension of `data`. Alternatively, this param can be used to provide a custom set of times. Accepts any format that can be converted by `pandas.to_datetime()`. For example: `time=pd.date_range(start="2000", end="2001", freq="5h")`	`None`
`model` ¶	`str or list of str`	The tide model (or list of models) to use to model tides. If a list is provided, the resulting statistics will be returned as a `pandas.Dataframe`; otherwise a `pandas.Series` will be returned. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported models, run `from eo_tides.utils import list_models; list_models()`.	`'EOT20'`
`directory` ¶	`str`	The directory containing tide model data files. If no path is provided, this will default to the environment variable `EO_TIDES_TIDE_MODELS` if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by `pyTMD` (https://geoscienceaustralia.github.io/eo-tides/setup/).	`None`
`tidepost_lat` ¶	`float or int`	Optional coordinates used to model tides. The default is None, which uses the centroid of the dataset as the tide modelling location.	`None`
`tidepost_lon` ¶	`float or int`	Optional coordinates used to model tides. The default is None, which uses the centroid of the dataset as the tide modelling location.	`None`
`plain_english` ¶	`bool`	An optional boolean indicating whether to print a plain english version of the tidal statistics to the screen. Defaults to True; only supported when a single tide model is passed to `model`.	`True`
`plot` ¶	`bool`	An optional boolean indicating whether to plot how satellite- observed tide heights compare against the full tidal range. Defaults to True; only supported when a single tide model is passed to `model`.	`True`
`plot_var` ¶	`str`	Optional name of a coordinate, dimension or variable in the array that will be used to plot observations with unique symbols. Defaults to None, which will plot all observations as circles.	`None`
`point_col` ¶	`str`	Colour used to plot points on the graph. Defaults to None which will automatically select colours.	`None`
`modelled_freq` ¶	`str`	An optional string giving the frequency at which to model tides when computing the full modelled tidal range. Defaults to '3h', which computes a tide height for every three hours across the temporal extent of `data`.	`'3h'`
`min_max_q` ¶	`tuple`	Quantiles used to calculate max and min observed and modelled astronomical tides. By default `(0.0, 1.0)` which is equivalent to minimum and maximum; to use a softer threshold that is more robust to outliers, use e.g. `(0.1, 0.9)`.	`(0.0, 1.0)`
`round_stats` ¶	`int`	The number of decimal places used to round the output statistics. Defaults to 3.	`3`
`**tag_tides_kwargs` ¶		Optional parameters passed to the `eo_tides.eo.tag_tides` function that is used to model tides for each observed and modelled timestep.	`{}`

Returns:

Name	Type	Description
`stats_df`	`Series or Dataframe`	A pandas object containing the following statistics: `y`: latitude used for modelling tide heights `x`: longitude used for modelling tide heights `mot`: mean tide height observed by the satellite (metres) `mat`: mean modelled astronomical tide height (metres) `lot`: minimum tide height observed by the satellite (metres) `lat`: minimum tide height from modelled astronomical tidal range (metres) `hot`: maximum tide height observed by the satellite (metres) `hat`: maximum tide height from modelled astronomical tidal range (metres) `otr`: tidal range observed by the satellite (metres) `tr`: modelled astronomical tide range (metres) `spread`: proportion of the full modelled tidal range observed by the satellite `offset_low`: proportion of the lowest tides never observed by the satellite `offset_high`: proportion of the highest tides never observed by the satellite

Source code in eo_tides/stats.py

def tide_stats(
    data: xr.Dataset | xr.DataArray | GeoBox,
    time: DatetimeLike | None = None,
    model: str = "EOT20",
    directory: str | os.PathLike | None = None,
    tidepost_lat: float | None = None,
    tidepost_lon: float | None = None,
    plain_english: bool = True,
    plot: bool = True,
    plot_var: str | None = None,
    point_col: str | None = None,
    modelled_freq: str = "3h",
    min_max_q: tuple = (0.0, 1.0),
    round_stats: int = 3,
    **tag_tides_kwargs,
) -> pd.Series:
    """Generate tide statistics and satellite tide bias metrics for every dataset timestep.

    Takes a multi-dimensional dataset and generate tide statistics
    and satellite-observed tide bias metrics, calculated based on
    every timestep in the satellite data and the geographic centroid
    of the imagery.

    By comparing the subset of tides observed by satellites
    against the full astronomical tidal range, we can evaluate
    whether the tides observed by satellites are biased
    (e.g. fail to observe either the highest or lowest tides) due
    to tide aliasing interactions with sun-synchronous satellite
    overpasses.

    For more information about the tidal statistics computed by this
    function, refer to Figure 8 in Bishop-Taylor et al. 2018:
    https://www.sciencedirect.com/science/article/pii/S0272771418308783#fig8

    Parameters
    ----------
    data : xarray.Dataset or xarray.DataArray or odc.geo.geobox.GeoBox
        A multi-dimensional dataset or GeoBox pixel grid that will
        be used to calculate tide statistics. If `data` is an
        xarray object, it should include a "time" dimension.
        If no "time" dimension exists or if `data` is a GeoBox,
        then times must be passed using the `time` parameter.
    time : DatetimeLike, optional
        By default, tides will be modelled using times from the
        "time" dimension of `data`. Alternatively, this param can
        be used to provide a custom set of times. Accepts any format
        that can be converted by `pandas.to_datetime()`. For example:
        `time=pd.date_range(start="2000", end="2001", freq="5h")`
    model : str or list of str, optional
        The tide model (or list of models) to use to model tides.
        If a list is provided, the resulting statistics will be
        returned as a `pandas.Dataframe`; otherwise a `pandas.Series`
        will be returned. Defaults to "EOT20"; specify "all" to use
        all models available in `directory`. For a full list of
        available and supported models, run
        `from eo_tides.utils import list_models; list_models()`.
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    tidepost_lat, tidepost_lon : float or int, optional
        Optional coordinates used to model tides. The default is None,
        which uses the centroid of the dataset as the tide modelling
        location.
    plain_english : bool, optional
        An optional boolean indicating whether to print a plain english
        version of the tidal statistics to the screen. Defaults to True;
        only supported when a single tide model is passed to `model`.
    plot : bool, optional
        An optional boolean indicating whether to plot how satellite-
        observed tide heights compare against the full tidal range.
        Defaults to True; only supported when a single tide model is
        passed to `model`.
    plot_var : str, optional
        Optional name of a coordinate, dimension or variable in the array
        that will be used to plot observations with unique symbols.
        Defaults to None, which will plot all observations as circles.
    point_col : str, optional
        Colour used to plot points on the graph. Defaults to None which
        will automatically select colours.
    modelled_freq : str, optional
        An optional string giving the frequency at which to model tides
        when computing the full modelled tidal range. Defaults to '3h',
        which computes a tide height for every three hours across the
        temporal extent of `data`.
    min_max_q : tuple, optional
        Quantiles used to calculate max and min observed and modelled
        astronomical tides. By default `(0.0, 1.0)` which is equivalent
        to minimum and maximum; to use a softer threshold that is more
        robust to outliers, use e.g. `(0.1, 0.9)`.
    round_stats : int, optional
        The number of decimal places used to round the output statistics.
        Defaults to 3.
    **tag_tides_kwargs :
        Optional parameters passed to the `eo_tides.eo.tag_tides`
        function that is used to model tides for each observed and
        modelled timestep.

    Returns
    -------
    stats_df : pandas.Series or pandas.Dataframe
        A pandas object containing the following statistics:

        - `y`: latitude used for modelling tide heights
        - `x`: longitude used for modelling tide heights
        - `mot`: mean tide height observed by the satellite (metres)
        - `mat`: mean modelled astronomical tide height (metres)
        - `lot`: minimum tide height observed by the satellite (metres)
        - `lat`: minimum tide height from modelled astronomical tidal range (metres)
        - `hot`: maximum tide height observed by the satellite (metres)
        - `hat`: maximum tide height from modelled astronomical tidal range (metres)
        - `otr`: tidal range observed by the satellite (metres)
        - `tr`: modelled astronomical tide range (metres)
        - `spread`: proportion of the full modelled tidal range observed by the satellite
        - `offset_low`: proportion of the lowest tides never observed by the satellite
        - `offset_high`: proportion of the highest tides never observed by the satellite

    """
    # Standardise data inputs, time and models
    gbox, obs_times = _standardise_inputs(data, time)

    # Generate range of times covering entire period of satellite record
    assert obs_times is not None  # noqa: S101
    all_times = pd.date_range(
        start=obs_times.min().item(),
        end=obs_times.max().item(),
        freq=modelled_freq,
    )

    # If custom tide modelling locations are not provided, use the
    # dataset centroid
    if not tidepost_lat or not tidepost_lon:
        tidepost_lon, tidepost_lat = gbox.geographic_extent.centroid.coords[0]

    # Model tides for observed timesteps
    obs_tides_da = tag_tides(
        gbox,
        time=obs_times,
        model=model,
        directory=directory,
        tidepost_lat=tidepost_lat,
        tidepost_lon=tidepost_lon,
        **tag_tides_kwargs,
    )

    # Model tides for all modelled timesteps
    all_tides_da = tag_tides(
        gbox,
        time=all_times,
        model=model,
        directory=directory,
        tidepost_lat=tidepost_lat,
        tidepost_lon=tidepost_lon,
        **tag_tides_kwargs,
    )

    # Calculate statistics
    # # (cast ensures typing knows these are always DataArrays)
    stats_ds = _tide_statistics(
        cast(xr.DataArray, obs_tides_da),
        cast(xr.DataArray, all_tides_da),
        min_max_q=min_max_q,
    )

    # Convert to pandas and add tide post coordinates
    stats_df = stats_ds.to_pandas().astype("float32")
    stats_df["x"] = tidepost_lon
    stats_df["y"] = tidepost_lat

    # Convert coordinates to index if dataframe
    if isinstance(stats_df, pd.DataFrame):
        stats_df = stats_df.set_index(["x", "y"], append=True)

    # If a series, print and plot summaries
    else:
        if plain_english:
            _stats_plain_english(
                mot=stats_df.mot,
                mat=stats_df.mat,
                hot=stats_df.hot,
                hat=stats_df.hat,
                lot=stats_df.lot,
                lat=stats_df.lat,
                otr=stats_df.otr,
                tr=stats_df.tr,
                spread=stats_df.spread,
                offset_low=stats_df.offset_low,
                offset_high=stats_df.offset_high,
            )

        if plot:
            _stats_figure(
                all_tides_da=all_tides_da,
                obs_tides_da=obs_tides_da,
                hot=stats_df.hot,
                hat=stats_df.hat,
                lot=stats_df.lot,
                lat=stats_df.lat,
                spread=stats_df.spread,
                offset_low=stats_df.offset_low,
                offset_high=stats_df.offset_high,
                plot_var=data[plot_var] if plot_var else None,
                point_col=point_col,
            )

    # Return in Pandas format
    return stats_df.round(round_stats)

eo_tides.validation ¶

Validation tools for comparing modelled tides to observed tide gauge data.

This module provides functions for loading, filtering, and analysing observed tide gauge data to validate modelled tide heights.

Functions:

Name	Description
`eval_metrics`	Calculate common statistical validation metrics.
`load_gauge_gesla`	Load Global Extreme Sea Level Analysis (GESLA) tide gauge data.

eval_metrics ¶

eval_metrics(x, y, round=3, all_regress=False)

Calculate common statistical validation metrics.

These include:

Pearson correlation
Root Mean Squared Error
Mean Absolute Error
R-squared
Bias
Linear regression parameters (slope, p-value, intercept, standard error)

Parameters:

Name	Type	Description	Default
`x` ¶	`array`	An array providing "actual" variable values.	required
`y` ¶	`array`	An array providing "predicted" variable values.	required
`round` ¶	`int`	Number of decimal places to round each metric to. Defaults to 3.	`3`
`all_regress` ¶	`bool`	Whether to return linear regression p-value, intercept and standard error (in addition to only regression slope). Defaults to False.	`False`

Returns:

Type	Description
`Series`	A `pd.Series` containing all calculated metrics.

Source code in eo_tides/validation.py

def eval_metrics(x, y, round=3, all_regress=False):  # noqa: A002
    """Calculate common statistical validation metrics.

    These include:

    * Pearson correlation
    * Root Mean Squared Error
    * Mean Absolute Error
    * R-squared
    * Bias
    * Linear regression parameters (slope, p-value, intercept, standard error)

    Parameters
    ----------
    x : numpy.array
        An array providing "actual" variable values.
    y : numpy.array
        An array providing "predicted" variable values.
    round : int
        Number of decimal places to round each metric
        to. Defaults to 3.
    all_regress : bool
        Whether to return linear regression p-value,
        intercept and standard error (in addition to
        only regression slope). Defaults to False.

    Returns
    -------
    pandas.Series
        A `pd.Series` containing all calculated metrics.

    """
    # Create dataframe to drop na
    xy_df = pd.DataFrame({"x": x, "y": y}).dropna()

    # Compute linear regression
    lin_reg = stats.linregress(x=xy_df.x, y=xy_df.y)

    # Calculate statistics
    stats_dict = {
        "Correlation": xy_df.corr().iloc[0, 1],
        "RMSE": sqrt(mean_squared_error(xy_df.x, xy_df.y)),
        "MAE": mean_absolute_error(xy_df.x, xy_df.y),
        "R-squared": lin_reg.rvalue**2,
        "Bias": (xy_df.y - xy_df.x).mean(),
        "Regression slope": lin_reg.slope,
    }

    # Additional regression params
    if all_regress:
        stats_dict.update({
            "Regression p-value": lin_reg.pvalue,
            "Regression intercept": lin_reg.intercept,
            "Regression standard error": lin_reg.stderr,
        })

    # Return as
    return pd.Series(stats_dict).round(round)

load_gauge_gesla ¶

load_gauge_gesla(
    x=None,
    y=None,
    site_code=None,
    time=("2018", "2020"),
    max_distance=None,
    correct_mean=False,
    filter_use_flag=True,
    site_metadata=True,
    data_path="GESLA3.0_ALL",
    metadata_path="",
)

Load Global Extreme Sea Level Analysis (GESLA) tide gauge data.

Load and process all available GESLA measured sea-level data with an x, y, time spatio-temporal query, or from a list of specific tide gauges. Can optionally filter by gauge quality and append detailed gauge metadata.

Modified from original code in https://github.com/philiprt/GeslaDataset.

Parameters:

Name	Type	Description	Default
`x` ¶	`numeric or list / tuple`	Coordinates (in degrees longitude, latitude) used to load GESLA tide gauge observations. If provided as singular values (e.g. `x=150, y=-32`), then the nearest tide gauge will be returned. If provided as a list or tuple (e.g. `x=(150, 152), y=(-32, -30)`), then all gauges within the provided bounding box will be loaded. Leave as `None` to return all available gauges, or if providing a list of site codes using `site_code`.	`None`
`y` ¶	`numeric or list / tuple`	Coordinates (in degrees longitude, latitude) used to load GESLA tide gauge observations. If provided as singular values (e.g. `x=150, y=-32`), then the nearest tide gauge will be returned. If provided as a list or tuple (e.g. `x=(150, 152), y=(-32, -30)`), then all gauges within the provided bounding box will be loaded. Leave as `None` to return all available gauges, or if providing a list of site codes using `site_code`.	`None`
`site_code` ¶	`str or list of str`	GESLA site code(s) for which to load data (e.g. `site_code="62650"`). If `site_code` is provided, `x` and `y` will be ignored.	`None`
`time` ¶	`tuple or list of str`	Time range to consider, given as a tuple of start and end dates, e.g. `time=("2020", "2021")`. The default of None will return all tide observations from the year 1800 onward.	`('2018', '2020')`
`max_distance` ¶	`numeric`	Optional max distance within which to return the nearest tide gauge when `x` and `y` are provided as singular coordinates. Defaults to None, which will always return a tide gauge no matter how far away it is located from `x` and `y`.	`None`
`correct_mean` ¶	`bool`	Whether to correct sea level measurements to a standardised mean sea level by subtracting the mean of all observed sea level observations. This can be useful when GESLA tide heights come from different or unknown tide datums. Note: the observed mean sea level calculated here may differ from true long-term/ astronomical Mean Sea Level (MSL) datum.	`False`
`filter_use_flag` ¶	`bool`	Whether to filter out low quality observations with a "use_flag" value of 0 (do not use). Defaults to True.	`True`
`site_metadata` ¶	`bool`	Whether to add tide gauge station metadata as additional columns in the output DataFrame. Defaults to True.	`True`
`data_path` ¶	`str`	Path to the raw GESLA data files ("GESLA-3 DATA", accessible via: https://gesla787883612.wordpress.com/downloads/). Defaults to "GESLA3.0_ALL".	`'GESLA3.0_ALL'`
`metadata_path` ¶	`str`	Path to the GESLA station metadata file ("GESLA-3 CSV META-DATA FILE", accessible via: https://gesla787883612.wordpress.com/downloads/). Defaults to "GESLA3_ALL 2.csv".	`''`

Returns:

Type	Description
`DataFrame`	Processed GESLA data as a DataFrame with columns including: "time": Timestamps, "sea_level": Observed sea level (m), "qc_flag": Observed sea level QC flag, "use_flag": Use-in-analysis flag (1 = use, 0 = do not use), ...and additional columns from station metadata.

Source code in eo_tides/validation.py

def load_gauge_gesla(
    x=None,
    y=None,
    site_code=None,
    time=("2018", "2020"),
    max_distance=None,
    correct_mean=False,
    filter_use_flag=True,
    site_metadata=True,
    data_path="GESLA3.0_ALL",
    metadata_path="",
):
    """Load Global Extreme Sea Level Analysis (GESLA) tide gauge data.

    Load and process all available GESLA measured sea-level data
    with an `x, y, time` spatio-temporal query, or from a list of
    specific tide gauges. Can optionally filter by gauge quality
    and append detailed gauge metadata.

    Modified from original code in <https://github.com/philiprt/GeslaDataset>.

    Parameters
    ----------
    x, y : numeric or list/tuple, optional
        Coordinates (in degrees longitude, latitude) used to load GESLA
        tide gauge observations. If provided as singular values
        (e.g. `x=150, y=-32`), then the nearest tide gauge will be returned.
        If provided as a list or tuple (e.g. `x=(150, 152), y=(-32, -30)`),
        then all gauges within the provided bounding box will be loaded.
        Leave as `None` to return all available gauges, or if providing a
        list of site codes using `site_code`.
    site_code : str or list of str, optional
        GESLA site code(s) for which to load data (e.g. `site_code="62650"`).
        If `site_code` is provided, `x` and `y` will be ignored.
    time : tuple or list of str, optional
        Time range to consider, given as a tuple of start and end dates,
        e.g. `time=("2020", "2021")`. The default of None will return all
        tide observations from the year 1800 onward.
    max_distance : numeric, optional
        Optional max distance within which to return the nearest tide gauge
        when `x` and `y` are provided as singular coordinates. Defaults to
        None, which will always return a tide gauge no matter how far away
        it is located from `x` and `y`.
    correct_mean : bool, optional
        Whether to correct sea level measurements to a standardised mean
        sea level by subtracting the mean of all observed sea level
        observations. This can be useful when GESLA tide heights come
        from different or unknown tide datums. Note: the observed mean
        sea level calculated here may differ from true long-term/
        astronomical Mean Sea Level (MSL) datum.
    filter_use_flag : bool, optional
        Whether to filter out low quality observations with a "use_flag"
        value of 0 (do not use). Defaults to True.
    site_metadata : bool, optional
        Whether to add tide gauge station metadata as additional columns
        in the output DataFrame. Defaults to True.
    data_path : str, optional
        Path to the raw GESLA data files ("GESLA-3 DATA", accessible via:
        https://gesla787883612.wordpress.com/downloads/). Defaults to
        "GESLA3.0_ALL".
    metadata_path : str, optional
        Path to the GESLA station metadata file ("GESLA-3 CSV META-DATA FILE",
        accessible via: https://gesla787883612.wordpress.com/downloads/).
        Defaults to "GESLA3_ALL 2.csv".

    Returns
    -------
    pd.DataFrame
        Processed GESLA data as a DataFrame with columns including:

        - "time": Timestamps,
        - "sea_level": Observed sea level (m),
        - "qc_flag": Observed sea level QC flag,
        - "use_flag": Use-in-analysis flag (1 = use, 0 = do not use),

        ...and additional columns from station metadata.

    """
    # Expand and validate data and metadata paths
    data_path = Path(data_path).expanduser()
    metadata_path = Path(metadata_path).expanduser()

    if not data_path.exists():
        err_msg = (
            f"GESLA raw data directory not found at: {data_path}\n"
            "Download 'GESLA-3 DATA' from: "
            "https://gesla787883612.wordpress.com/downloads/",
        )
        raise FileNotFoundError(err_msg)

    if not metadata_path.exists():
        err_msg = (
            f"GESLA station metadata file not found at: {metadata_path}\n"
            "Download the 'GESLA-3 CSV META-DATA FILE' from: "
            "https://gesla787883612.wordpress.com/downloads/",
        )
        raise FileNotFoundError(err_msg)

    # Load tide gauge metadata
    metadata_df, metadata_gdf = _load_gauge_metadata(metadata_path)

    # Use supplied site codes if available
    if site_code is not None:
        site_code = [site_code] if not isinstance(site_code, list) else site_code

    # If x and y are tuples, use xy bounds to identify sites
    elif isinstance(x, tuple | list) & isinstance(y, tuple | list):
        bbox = BoundingBox.from_xy(x, y)
        site_code = metadata_gdf.cx[bbox.left : bbox.right, bbox.top : bbox.bottom].index

    # If x and y are single numbers, select nearest row
    elif isinstance(x, Number) & isinstance(y, Number):
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            site_code = (
                _nearest_row(metadata_gdf, x, y, max_distance).rename({"index_right": "site_code"}, axis=1).site_code
            )

        # Raise exception if no valid tide gauges are found
        if site_code.isna().all():
            err_msg = f"No tide gauge found within {max_distance} degrees of {x}, {y}."
            raise Exception(err_msg)

    # Otherwise if all are None, return all available site codes
    elif (site_code is None) & (x is None) & (y is None):
        site_code = metadata_df.index.to_list()

    else:
        err_msg = (
            "`x` and `y` must be provided as either singular coordinates (e.g. `x=150`), or as a tuple bounding box (e.g. `x=(150, 152)`).",
        )
        raise Exception(err_msg)

    # Prepare times
    if time is None:
        time = ["1800", str(datetime.datetime.now().year)]
    time = [time] if not isinstance(time, list | tuple) else time
    start_time = _round_date_strings(time[0], round_type="start")
    end_time = _round_date_strings(time[-1], round_type="end")

    # Identify paths to load and nodata values for each site
    metadata_df["file_name"] = data_path / metadata_df["file_name"]
    paths_na = metadata_df.loc[site_code, ["file_name", "null_value"]]

    # Load and combine into a single dataframe
    gauge_list = [
        _load_gesla_dataset(s, p, na_value=na)
        for s, p, na in tqdm.tqdm(paths_na.itertuples(), total=len(paths_na), desc="Loading GESLA gauges")
    ]
    data_df = pd.concat(gauge_list).sort_index().loc[slice(start_time, end_time)].reset_index().set_index("site_code")

    # Optionally filter by use flag column
    if filter_use_flag:
        data_df = data_df.loc[data_df.use_flag == 1]

    # Optionally insert metadata into dataframe
    if site_metadata:
        data_df[metadata_df.columns] = metadata_df.loc[site_code]

    # Add time to index and remove duplicates
    data_df = data_df.set_index("time", append=True)
    duplicates = data_df.index.duplicated()
    if duplicates.sum() > 0:
        warnings.warn("Duplicate timestamps were removed.", stacklevel=2)
        data_df = data_df.loc[~duplicates]

    # Remove observed mean sea level if requested
    if correct_mean:
        data_df["sea_level"] = data_df["sea_level"].sub(data_df.groupby("site_code")["sea_level"].transform("mean"))

    # Return data
    return data_df

eo_tides.utils ¶

General-purpose utilities for tide model setup and data processing.

This module includes tools for listing and clipping model files, performing spatial interpolation, and other helper tools used across the eo_tides package.

Functions:

Name	Description
`clip_models`	Clip NetCDF-format ocean tide models to a bounding box.
`idw`	Perform Inverse Distance Weighting (IDW) interpolation.
`list_models`	List all tide models available for tide modelling.

clip_models ¶

clip_models(
    input_directory,
    output_directory,
    bbox,
    model=None,
    buffer=5,
    overwrite=False,
)

Clip NetCDF-format ocean tide models to a bounding box.

This function identifies all NetCDF-format tide models in a given input directory, including "ATLAS-netcdf" (e.g. TPXO9-atlas-nc), "FES-netcdf" (e.g. FES2022, EOT20), and "GOT-netcdf" (e.g. GOT5.5) format files. Files for each model are then clipped to the extent of the provided bounding box, handling model-specific file structures. After each model is clipped, the result is exported to the output directory and verified with pyTMD to ensure the clipped data is suitable for tide modelling.

For tide model setup instructions, refer to the guide: https://geoscienceaustralia.github.io/eo-tides/setup/

Parameters:

Name	Type	Description	Default
`input_directory` ¶	`str or PathLike`	Path to directory containing input NetCDF-format tide model files.	required
`output_directory` ¶	`str or PathLike`	Path to directory where clipped NetCDF files will be exported.	required
`bbox` ¶	`tuple of float`	Bounding box for clipping the tide models in EPSG:4326 degrees coordinates, specified as `(left, bottom, right, top)`.	required
`model` ¶	`str or list of str`	The tide model (or models) to clip. Defaults to None, which will automatically identify and clip all NetCDF-format models in the input directly.	`None`
`buffer` ¶	`float`	Buffer distance (in degrees) added to the bounding box to provide sufficient data on edges of study area. Defaults to 5 degrees.	`5`
`overwrite` ¶	`bool`	If True, overwrite existing files in the output directory. Defaults to False.	`False`

Examples:

>>> clip_models(
...     input_directory="tide_models/",
...     output_directory="tide_models_clipped/",
...     bbox=(-8.968392, 50.070574, 2.447160, 59.367122),
... )

Source code in eo_tides/utils.py

def clip_models(
    input_directory: str | os.PathLike,
    output_directory: str | os.PathLike,
    bbox: tuple[float, float, float, float],
    model: list | None = None,
    buffer: float = 5,
    overwrite: bool = False,
) -> None:
    """Clip NetCDF-format ocean tide models to a bounding box.

    This function identifies all NetCDF-format tide models in a
    given input directory, including "ATLAS-netcdf" (e.g. TPXO9-atlas-nc),
    "FES-netcdf" (e.g. FES2022, EOT20), and "GOT-netcdf" (e.g. GOT5.5)
    format files. Files for each model are then clipped to the extent of
    the provided bounding box, handling model-specific file structures.
    After each model is clipped, the result is exported to the output
    directory and verified with `pyTMD` to ensure the clipped data is
    suitable for tide modelling.

    For tide model setup instructions, refer to the guide:
    https://geoscienceaustralia.github.io/eo-tides/setup/

    Parameters
    ----------
    input_directory : str or os.PathLike
        Path to directory containing input NetCDF-format tide model files.
    output_directory : str or os.PathLike
        Path to directory where clipped NetCDF files will be exported.
    bbox : tuple of float
        Bounding box for clipping the tide models in EPSG:4326 degrees
        coordinates, specified as `(left, bottom, right, top)`.
    model : str or list of str, optional
        The tide model (or models) to clip. Defaults to None, which
        will automatically identify and clip all NetCDF-format models
        in the input directly.
    buffer : float, optional
        Buffer distance (in degrees) added to the bounding box to provide
        sufficient data on edges of study area. Defaults to 5 degrees.
    overwrite : bool, optional
        If True, overwrite existing files in the output directory.
        Defaults to False.

    Examples
    --------
    >>> clip_models(
    ...     input_directory="tide_models/",
    ...     output_directory="tide_models_clipped/",
    ...     bbox=(-8.968392, 50.070574, 2.447160, 59.367122),
    ... )

    """
    # Get input and output paths
    input_directory = _set_directory(input_directory)
    output_directory = pathlib.Path(output_directory)

    # Prepare bounding box
    bbox = odc.geo.geom.BoundingBox(*bbox, crs="EPSG:4326").buffered(buffer)

    # Identify NetCDF models
    model_database = load_database()["elevation"]
    netcdf_formats = ["ATLAS-netcdf", "FES-netcdf", "GOT-netcdf"]
    netcdf_models = {k for k, v in model_database.items() if v["format"] in netcdf_formats}

    # Identify subset of available and requested NetCDF models
    available_models, _ = list_models(directory=input_directory, show_available=False, show_supported=False)
    requested_models = list(np.atleast_1d(model)) if model is not None else available_models
    available_netcdf_models = list(set(available_models) & set(requested_models) & set(netcdf_models))

    # Raise error if no valid models found
    if len(available_netcdf_models) == 0:
        err_msg = f"No valid NetCDF models found in {input_directory}."
        raise ValueError(err_msg)

    # If model list is provided,
    print(f"Preparing to clip suitable NetCDF models: {available_netcdf_models}\n")

    # Loop through suitable models and export
    for m in available_netcdf_models:
        # Get model file and grid file list if they exist
        model_files = model_database[m].get("model_file", [])
        grid_file = model_database[m].get("grid_file", [])

        # Convert to list of strings and combine
        model_files = model_files if isinstance(model_files, list) else [model_files]
        grid_file = grid_file if isinstance(grid_file, list) else [grid_file]
        all_files = model_files + grid_file

        # Loop through each model file and clip
        for file in tqdm(all_files, desc=f"Clipping {m}"):
            # Skip if it exists in output directory
            if (output_directory / file).exists() and not overwrite:
                continue

            # Load model file
            nc = xr.open_mfdataset(input_directory / file)

            # Open file and clip according to model
            if m in (
                "GOT5.5",
                "GOT5.5_load",
                "GOT5.5_extrapolated",
                "GOT5.5D",
                "GOT5.5D_extrapolated",
                "GOT5.6",
                "GOT5.6_extrapolated",
            ):
                nc_clipped = _clip_model_file(
                    nc,
                    bbox,
                    xdim="lon",
                    ydim="lat",
                    ycoord="latitude",
                    xcoord="longitude",
                )

            elif m in ("HAMTIDE11",):
                nc_clipped = _clip_model_file(nc, bbox, xdim="LON", ydim="LAT", ycoord="LAT", xcoord="LON")

            elif m in (
                "EOT20",
                "EOT20_load",
                "FES2012",
                "FES2014",
                "FES2014_extrapolated",
                "FES2014_load",
                "FES2022",
                "FES2022_extrapolated",
                "FES2022_load",
            ):
                nc_clipped = _clip_model_file(nc, bbox, xdim="lon", ydim="lat", ycoord="lat", xcoord="lon")

            elif m in (
                "TPXO8-atlas-nc",
                "TPXO9-atlas-nc",
                "TPXO9-atlas-v2-nc",
                "TPXO9-atlas-v3-nc",
                "TPXO9-atlas-v4-nc",
                "TPXO9-atlas-v5-nc",
                "TPXO10-atlas-v2-nc",
            ):
                nc_clipped = _clip_model_file(
                    nc,
                    bbox,
                    xdim="nx",
                    ydim="ny",
                    ycoord="lat_z",
                    xcoord="lon_z",
                )

            else:
                err_msg = f"Model {m} not supported"
                raise Exception(err_msg)

            # Create directory and export
            (output_directory / file).parent.mkdir(parents=True, exist_ok=True)
            nc_clipped.to_netcdf(output_directory / file, mode="w")

        # Verify that models are ready
        if pyTMD.io.model(directory=output_directory).elevation(m=m).verify:
            print(" ✅ Clipped model exported and verified")
        else:
            print(" ❌ Clipped model exported but unable to be verified")

    print(f"\nOutputs exported to {output_directory}")
    list_models(directory=output_directory, show_available=True, show_supported=False)

idw ¶

idw(
    input_z,
    input_x,
    input_y,
    output_x,
    output_y,
    p=1,
    k=10,
    max_dist=None,
    k_min=1,
    epsilon=1e-12,
)

Perform Inverse Distance Weighting (IDW) interpolation.

This function performs fast IDW interpolation by creating a KDTree from the input coordinates then uses it to find the k nearest neighbors for each output point. Weights are calculated based on the inverse distance to each neighbor, with weights descreasing with increasing distance.

Code inspired by: https://github.com/DahnJ/REM-xarray

Parameters:

Name	Type	Description	Default
`input_z` ¶	`array - like`	Array of values at the input points. This can be either a 1-dimensional array, or a 2-dimensional array where each column (axis=1) represents a different set of values to be interpolated.	required
`input_x` ¶	`array - like`	Array of x-coordinates of the input points.	required
`input_y` ¶	`array - like`	Array of y-coordinates of the input points.	required
`output_x` ¶	`array - like`	Array of x-coordinates where the interpolation is to be computed.	required
`output_y` ¶	`array - like`	Array of y-coordinates where the interpolation is to be computed.	required
`p` ¶	`int or float`	Power function parameter defining how rapidly weightings should decrease as distance increases. Higher values of `p` will cause weights for distant points to decrease rapidly, resulting in nearby points having more influence on predictions. Defaults to 1.	`1`
`k` ¶	`int`	Number of nearest neighbors to use for interpolation. `k=1` is equivalent to "nearest" neighbour interpolation. Defaults to 10.	`10`
`max_dist` ¶	`int or float`	Restrict neighbouring points to less than this distance. By default, no distance limit is applied.	`None`
`k_min` ¶	`int`	If `max_dist` is provided, some points may end up with less than `k` nearest neighbours, potentially producing less reliable interpolations. Set `k_min` to set any points with less than `k_min` neighbours to NaN. Defaults to 1.	`1`
`epsilon` ¶	`float`	Small value added to distances to prevent division by zero errors in the case that output coordinates are identical to input coordinates. Defaults to 1e-12.	`1e-12`

Returns:

Name	Type	Description
`interp_values`	`ndarray`	Interpolated values at the output coordinates. If `input_z` is 1-dimensional, `interp_values` will also be 1-dimensional. If `input_z` is 2-dimensional, `interp_values` will have the same number of rows as `input_z`, with each column (axis=1) representing interpolated values for one set of input data.

Examples:

>>> input_z = [1, 2, 3, 4, 5]
>>> input_x = [0, 1, 2, 3, 4]
>>> input_y = [0, 1, 2, 3, 4]
>>> output_x = [0.5, 1.5, 2.5]
>>> output_y = [0.5, 1.5, 2.5]
>>> idw(input_z, input_x, input_y, output_x, output_y, k=2)
array([1.5, 2.5, 3.5])

Source code in eo_tides/utils.py

def idw(
    input_z,
    input_x,
    input_y,
    output_x,
    output_y,
    p=1,
    k=10,
    max_dist=None,
    k_min=1,
    epsilon=1e-12,
):
    """Perform Inverse Distance Weighting (IDW) interpolation.

    This function performs fast IDW interpolation by creating a KDTree
    from the input coordinates then uses it to find the `k` nearest
    neighbors for each output point. Weights are calculated based on the
    inverse distance to each neighbor, with weights descreasing with
    increasing distance.

    Code inspired by: <https://github.com/DahnJ/REM-xarray>

    Parameters
    ----------
    input_z : array-like
        Array of values at the input points. This can be either a
        1-dimensional array, or a 2-dimensional array where each column
        (axis=1) represents a different set of values to be interpolated.
    input_x : array-like
        Array of x-coordinates of the input points.
    input_y : array-like
        Array of y-coordinates of the input points.
    output_x : array-like
        Array of x-coordinates where the interpolation is to be computed.
    output_y : array-like
        Array of y-coordinates where the interpolation is to be computed.
    p : int or float, optional
        Power function parameter defining how rapidly weightings should
        decrease as distance increases. Higher values of `p` will cause
        weights for distant points to decrease rapidly, resulting in
        nearby points having more influence on predictions. Defaults to 1.
    k : int, optional
        Number of nearest neighbors to use for interpolation. `k=1` is
        equivalent to "nearest" neighbour interpolation. Defaults to 10.
    max_dist : int or float, optional
        Restrict neighbouring points to less than this distance.
        By default, no distance limit is applied.
    k_min : int, optional
        If `max_dist` is provided, some points may end up with less than
        `k` nearest neighbours, potentially producing less reliable
        interpolations. Set `k_min` to set any points with less than
        `k_min` neighbours to NaN. Defaults to 1.
    epsilon : float, optional
        Small value added to distances to prevent division by zero
        errors in the case that output coordinates are identical to
        input coordinates. Defaults to 1e-12.

    Returns
    -------
    interp_values : numpy.ndarray
        Interpolated values at the output coordinates. If `input_z` is
        1-dimensional, `interp_values` will also be 1-dimensional. If
        `input_z` is 2-dimensional, `interp_values` will have the same
        number of rows as `input_z`, with each column (axis=1)
        representing interpolated values for one set of input data.

    Examples
    --------
    >>> input_z = [1, 2, 3, 4, 5]
    >>> input_x = [0, 1, 2, 3, 4]
    >>> input_y = [0, 1, 2, 3, 4]
    >>> output_x = [0.5, 1.5, 2.5]
    >>> output_y = [0.5, 1.5, 2.5]
    >>> idw(input_z, input_x, input_y, output_x, output_y, k=2)
    array([1.5, 2.5, 3.5])

    """
    # Convert to numpy arrays
    input_x = np.atleast_1d(input_x)
    input_y = np.atleast_1d(input_y)
    input_z = np.atleast_1d(input_z)
    output_x = np.atleast_1d(output_x)
    output_y = np.atleast_1d(output_y)

    # Verify input and outputs have matching lengths
    if not (input_z.shape[0] == len(input_x) == len(input_y)):
        err_msg = "All of `input_z`, `input_x` and `input_y` must be the same length."
        raise ValueError(err_msg)
    if len(output_x) != len(output_y):
        err_msg = "Both `output_x` and `output_y` must be the same length."
        raise ValueError(err_msg)

    # Verify k is smaller than total number of points, and non-zero
    if k > input_z.shape[0]:
        err_msg = (
            f"The requested number of nearest neighbours (`k={k}`) "
            f"is smaller than the total number of points ({input_z.shape[0]}).",
        )
        raise ValueError(err_msg)
    if k == 0:
        err_msg = "Interpolation based on `k=0` nearest neighbours is not valid."
        raise ValueError(err_msg)

    # Create KDTree to efficiently find nearest neighbours
    points_xy = np.column_stack((input_y, input_x))
    tree = KDTree(points_xy)

    # Determine nearest neighbours and distances to each
    grid_stacked = np.column_stack((output_y, output_x))
    distances, indices = tree.query(grid_stacked, k=k, workers=-1)

    # If k == 1, add an additional axis for consistency
    if k == 1:
        distances = distances[..., np.newaxis]
        indices = indices[..., np.newaxis]

    # Add small epsilon to distances to prevent division by zero errors
    # if output coordinates are the same as input coordinates
    distances = np.maximum(distances, epsilon)

    # Set distances above max to NaN if specified
    if max_dist is not None:
        distances[distances > max_dist] = np.nan

    # Calculate weights based on distance to k nearest neighbours.
    weights = 1 / np.power(distances, p)
    weights = weights / np.nansum(weights, axis=1).reshape(-1, 1)

    # 1D case: Compute weighted sum of input_z values for each output point
    if input_z.ndim == 1:
        interp_values = np.nansum(weights * input_z[indices], axis=1)

    # 2D case: Compute weighted sum for each set of input_z values
    # weights[..., np.newaxis] adds a dimension for broadcasting
    else:
        interp_values = np.nansum(
            weights[..., np.newaxis] * input_z[indices],
            axis=1,
        )

    # Set any points with less than `k_min` valid weights to NaN
    interp_values[np.isfinite(weights).sum(axis=1) < k_min] = np.nan

    return interp_values

list_models ¶

list_models(
    directory=None,
    show_available=True,
    show_supported=True,
    raise_error=False,
    extra_databases=None,
)

List all tide models available for tide modelling.

This function scans the specified tide model directory and returns a list of models that are available in the directory as well as the full list of all models supported by eo-tides and pyTMD.

For tide model setup instructions, refer to the guide: https://geoscienceaustralia.github.io/eo-tides/setup/

Parameters:

Name	Type	Description	Default
`directory` ¶	`str`	The directory containing tide model data files. If no path is provided, this will default to the environment variable `EO_TIDES_TIDE_MODELS` if set, or raise an error if not. Tide modelling files should be stored in sub-folders for each model that match the structure required by `pyTMD` (https://geoscienceaustralia.github.io/eo-tides/setup/).	`None`
`show_available` ¶	`bool`	Whether to print a list of locally available models.	`True`
`show_supported` ¶	`bool`	Whether to print a list of all supported models, in addition to models available locally.	`True`
`raise_error` ¶	`bool`	If True, raise an error if no available models are found. If False, raise a warning.	`False`
`extra_databases` ¶	`str or path or list`	Additional custom tide model definitions to load, provided as dictionaries or paths to JSON database files. Use this to enable custom tide models not included with `pyTMD`. See: https://pytmd.readthedocs.io/en/latest/getting_started/Getting-Started.html#model-database	`None`

Returns:

Name	Type	Description
`available_models`	`list of str`	A list of all tide models available within `directory`.
`supported_models`	`list of str`	A list of all tide models supported by `eo-tides`.

Source code in eo_tides/utils.py

def list_models(
    directory: str | os.PathLike | None = None,
    show_available: bool = True,
    show_supported: bool = True,
    raise_error: bool = False,
    extra_databases: str | os.PathLike | list | None = None,
) -> tuple[list[str], list[str]]:
    """List all tide models available for tide modelling.

    This function scans the specified tide model directory
    and returns a list of models that are available in the
    directory as well as the full list of all models supported
    by `eo-tides` and `pyTMD`.

    For tide model setup instructions, refer to the guide:
    https://geoscienceaustralia.github.io/eo-tides/setup/

    Parameters
    ----------
    directory : str, optional
        The directory containing tide model data files. If no path is
        provided, this will default to the environment variable
        `EO_TIDES_TIDE_MODELS` if set, or raise an error if not.
        Tide modelling files should be stored in sub-folders for each
        model that match the structure required by `pyTMD`
        (<https://geoscienceaustralia.github.io/eo-tides/setup/>).
    show_available : bool, optional
        Whether to print a list of locally available models.
    show_supported : bool, optional
        Whether to print a list of all supported models, in
        addition to models available locally.
    raise_error : bool, optional
        If True, raise an error if no available models are found.
        If False, raise a warning.
    extra_databases : str or path or list, optional
        Additional custom tide model definitions to load, provided as
        dictionaries or paths to JSON database files. Use this to
        enable custom tide models not included with `pyTMD`.
        See: https://pytmd.readthedocs.io/en/latest/getting_started/Getting-Started.html#model-database

    Returns
    -------
    available_models : list of str
        A list of all tide models available within `directory`.
    supported_models : list of str
        A list of all tide models supported by `eo-tides`.

    """
    init()  # Initialize colorama

    # Set tide modelling files directory. If no custom path is
    # provided, try global environment variable.
    directory = _set_directory(directory)

    # Load supported models from pyTMD database, adding extras if required
    extra_databases = [] if extra_databases is None else extra_databases
    model_database = load_database(extra_databases=extra_databases)["elevation"]

    # Get full list of supported models
    supported_models = list(model_database.keys())

    # Extract expected model paths
    expected_paths = {}
    for m in supported_models:
        model_file = model_database[m]["model_file"]

        # Handle GOT5.6 differently to ensure we test for presence of GOT5.6 constituents
        if m in ("GOT5.6", "GOT5.6_extrapolated"):
            model_file = next(file for file in model_file if "GOT5.6" in file)
        else:
            model_file = model_file[0] if isinstance(model_file, list) else model_file

        # Add expected path to dict, adding directory prefix
        expected_paths[m] = str(directory / pathlib.Path(model_file).expanduser().parent)

    # Define column widths
    status_width = 4  # Width for emoji
    name_width = max(len(name) for name in supported_models)
    path_width = max(len(path) for path in expected_paths.values())

    # Print list of supported models, marking available and
    # unavailable models and appending available to list
    if show_available or show_supported:
        total_width = min(status_width + name_width + path_width + 6, 80)
        print("─" * total_width)
        print(f"{'󠀠🌊':^{status_width}} | {'Model':<{name_width}} | {'Expected path':<{path_width}}")
        print("─" * total_width)

    available_models = []
    for m in supported_models:
        try:
            # Load model
            model_file = pyTMD.io.model(directory=directory, extra_databases=extra_databases).elevation(m=m)

            # Append model to list of available model
            available_models.append(m)

            if show_available:
                # Mark available models with a green tick
                status = "✅"
                print(f"{status:^{status_width}}│ {m:<{name_width}} │ {expected_paths[m]:<{path_width}}")
        except FileNotFoundError:  # noqa: PERF203
            if show_supported:
                # Mark unavailable models with a red cross
                status = "❌"
                print(
                    f"{status:^{status_width}}│ {Style.DIM}{m:<{name_width}} │ {expected_paths[m]:<{path_width}}{Style.RESET_ALL}",
                )

    if show_available or show_supported:
        print("─" * total_width)

        # Print summary
        print(f"\n{Style.BRIGHT}Summary:{Style.RESET_ALL}")
        print(f"Available models: {len(available_models)}/{len(supported_models)}")

    # Raise error or warning if no models are available
    if not available_models:
        warning_msg = textwrap.dedent(
            f"""
            No valid tide models were found in `{directory}`.
            Please ensure that the path you provided is correct, or set the `EO_TIDES_TIDE_MODELS` environment variable to point to a valid tide model directory.
            For tide model setup instructions, refer to the guide: https://geoscienceaustralia.github.io/eo-tides/setup/
            """,
        ).strip()

        if raise_error:
            raise Exception(warning_msg) from None
        warnings.warn(warning_msg, UserWarning, stacklevel=2)

    # Return list of available and supported models
    return available_models, supported_models

API reference

eo_tides.model ¶

ensemble_tides ¶

tide_df ¶

crs ¶

ensemble_models ¶

ensemble_func ¶

ensemble_top_n ¶

ranking_points ¶

ranking_valid_perc ¶

**idw_kwargs ¶

model_phases ¶

x ¶

y ¶

time ¶

model ¶

directory ¶

time_offset ¶

return_tides ¶

**model_tides_kwargs ¶

model_tides ¶

x ¶

y ¶

time ¶

model ¶

directory ¶

crs ¶

mode ¶

output_format ¶

output_units ¶

method ¶

extrapolate ¶

cutoff ¶

crop ¶

crop_buffer ¶

append_node ¶

constituents ¶

parallel ¶

parallel_splits ¶

parallel_max ¶

ensemble_models ¶

extra_databases ¶

**ensemble_kwargs ¶

eo_tides.eo ¶

pixel_tides ¶

data ¶

time ¶

model ¶

directory ¶

resample ¶

calculate_quantiles ¶

resolution ¶

buffer ¶

resample_method ¶

dask_chunks ¶

dask_compute ¶

**model_tides_kwargs ¶

tag_tides ¶

data ¶

time ¶

model ¶

directory ¶

tidepost_lat ¶

tidepost_lon ¶

return_phases ¶

**model_tides_kwargs ¶

eo_tides.stats ¶

pixel_stats ¶

data ¶

time ¶

model ¶

directory ¶

resample ¶

modelled_freq ¶

min_max_q ¶

resample_method ¶

dask_chunks ¶

dask_compute ¶

extrapolate ¶

cutoff ¶

`tide_df` ¶

`crs` ¶

`ensemble_models` ¶

`ensemble_func` ¶

`ensemble_top_n` ¶

`ranking_points` ¶

`ranking_valid_perc` ¶

`**idw_kwargs` ¶

`x` ¶

`y` ¶

`time` ¶

`model` ¶

`directory` ¶

`time_offset` ¶

`return_tides` ¶

`**model_tides_kwargs` ¶

`x` ¶

`y` ¶

`time` ¶

`model` ¶

`directory` ¶

`crs` ¶

`mode` ¶

`output_format` ¶

`output_units` ¶

`method` ¶

`extrapolate` ¶

`cutoff` ¶

`crop` ¶

`crop_buffer` ¶

`append_node` ¶

`constituents` ¶

`parallel` ¶

`parallel_splits` ¶

`parallel_max` ¶

`ensemble_models` ¶

`extra_databases` ¶

`**ensemble_kwargs` ¶

`data` ¶

`time` ¶

`model` ¶

`directory` ¶

`resample` ¶

`calculate_quantiles` ¶

`resolution` ¶

`buffer` ¶

`resample_method` ¶

`dask_chunks` ¶

`dask_compute` ¶

`**model_tides_kwargs` ¶

`data` ¶

`time` ¶

`model` ¶

`directory` ¶

`tidepost_lat` ¶

`tidepost_lon` ¶

`return_phases` ¶

`**model_tides_kwargs` ¶

`data` ¶

`time` ¶

`model` ¶

`directory` ¶

`resample` ¶

`modelled_freq` ¶

`min_max_q` ¶

`resample_method` ¶

`dask_chunks` ¶

`dask_compute` ¶

`extrapolate` ¶

`cutoff` ¶

`**pixel_tides_kwargs` ¶

`satellites` ¶

`c` ¶

`units` ¶

`max_inf` ¶

`style` ¶

`data` ¶

`time` ¶

`model` ¶

`directory` ¶