StatInterface package

Submodules

StatInterface.GenerateDistributions module

GenerateDistributions – generate distributions of parameters

Generate the cumulative distribution functions (CDF’s) for a given parameter for each cell in the lat-lon grid (defined by gridLimit and gridSpace). This uses the method of kernel density estimators to determine the distributions.

class GenerateDistributions(configFile, gridLimit, gridSpace, gridInc, kdeType, minSamplesCell=40, missingValue=9223372036854775807)

Bases: object

Generate the cumulative distribution functions (CDF’s) for a given parameter for each cell in the lat-lon grid (defined by gridLimit and gridSpace). This uses the method of kernel density estimators to determine the distributions. The methods allow for extraction of paramter values from the parameter files created in DataProcess.DataProcess, and calculation (and saving) distributions.

Parameters
  • configFile (str) – Path to configuration file.

  • gridLimit (dict) – The bounds of the model domain. The dict should contain the keys xMin, xMax, yMin and yMax. The x variable bounds the longitude and the y variable bounds the latitude.

  • gridSpace (dict) – The default grid cell size. The dict should contain keys of x and y. The x variable defines the longitudinal grid size, and the y variable defines the latitudinal size.

  • gridInc (dict) – The increment in grid size, for those cells that do not contain sufficient observations for generating distributions. The dict should contain the keys x and y. The x variable defines the longitudinal grid increment, and the y variable defines the latitudinal increment.

  • kdeType (str) – Name of the (univariate) kernel estimator to use when generating the distribution. Must be one of Epanechnikov, Gaussian, Biweight or Triangular.

  • minSamplesCell (int) – Minimum number of valid observations required to generate distributions. If insufficient observations are found in a grid cell, then it is incrementally expanded until minSamplesCell is reached.

  • missingValue – Missing values have this value (default sys.maxint).

allDistributions(lonLat, parameterList, parameterName=None, kdeStep=0.1, angular=False, periodic=False, plotParam=False)

Calculate a distribution for each individual cell and store in a file or return the distribution.

Parameters
  • lonLat (str or numpy.ndarray) – The longitude/latitude of all observations in the model domain. If a string is given, then it is the path to a file containing the longitude/latitude information. If an array is given, then it should be a 2-d array containing the data values.

  • parameterList (str or numpy.ndarray) – Parameter values. If a string is given, then it is the path to a file containing the values. If an array is passed, then it should hold the parameter values.

  • parameterName (str) – Optional. If given, then the cell distributions will be saved to a file with this name. If absent, the distribution values are returned.

  • kdeStep (float, default=`0.1`) – Increment of the ordinate values at which the distributions will be calculated.

  • angular (boolean, default=``False``) – Does the data represent an angular measure (e.g. bearing).

  • periodic (boolean or float, default=``False``) – Does the data represent some form of periodic data (e.g. day of year). If given, it should be the period of the data (e.g. for annual data, periodic=365).

  • plotParam (boolean) – Plot the parameters. Default is False.

Returns

If no parameterName is given returns None (data are saved to file), otherwise numpy.ndarray.

extractParameter(cellNum)

Extracts the cyclone parameter data for the given cell. If the population of a cell is insufficient for generating a PDF, the bounds of the cell are expanded until the population is sufficient.

Null/missing values are removed.

Parameters

cellNum (int) – The cell number to process.

Returns

None. The parameter attribute is updated.

Raises

IndexError – if the cell number is not valid (i.e. if it is outside the possible range of cell numbers).

StatInterface.KDEOrigin module

KDEOrigin – kernel density estimation for genesis probability

Calculate a genesis probability distribution, based on the observed genesis locations and applying a 2-d kernel density estimation method.

class KDEOrigin(configFile, gridLimit, kdeStep, lonLat=None, progressbar=None)

Bases: object

Initialise the class for generating the genesis probability distribution. Initialisation will load the required data (genesis locations) and calculate the optimum bandwidth for the kernel density method.

Parameters
  • configFile (str) –

    Path to the configuration file. :param dict gridLimit: The bounds of the model domain. The

    dict should contain the keys xMin, xMax, yMin and yMax. The x variable bounds the longitude and the y variable bounds the latitude.

  • kdeStep (float) – Increment of the ordinate values at which the distributions will be calculated. Default=`0.1`

  • lonLat (numpy.ndarray) – If given, a 2-d array of the longitude and latitude of genesis locations. If not given, attempt to load an init_lon_lat file from the processed files.

  • progressbar (Utilities.progressbar object.) – A SimpleProgressBar() object to print progress to STDOUT.

generateCdf(save=False)

Generate the CDFs corresponding to PDFs of cyclone origins, then save it on a file path provided by user

Parameters

save (boolean) – If True, save the CDF to a netcdf file called ‘originCDF.nc’. If False, return the CDF.

generateKDE(save=False, plot=False)

Generate the PDF for cyclone origins using kernel density estimation technique then save it to a file path provided by user.

Parameters
  • bw (float) – Optional, bandwidth to use for generating the PDF. If not specified, use the bw attribute.

  • save (boolean) – If True, save the resulting PDF to a netCDF file called ‘originPDF.nc’.

  • plot (boolean) – If True, plot the resulting PDF.

Returns

x and y grid and the PDF values.

updateProgressBar(step, stepMax)

Callback function to update progress bar from C code

Parameters
  • n (int) – Current step.

  • nMax (int) – Maximum step.

getOriginBandwidth(data)

Calculate the optimal bandwidth for kernel density estimation from data.

Parameters

datanumpy.ndarray of data points for training data

Returns

Bandwidth parameter.

StatInterface.KDEParameters module

KDEParameters – generate KDE of cyclone parameters

Generates the probability density functions (using kernel density estimation) of given cyclone parameters (speed, pressure, bearing, etc). Each of these PDF’s is converted to a cumulative density function for use in other sections.

Note

In changing from the previous KPDF module to statsmodels, the bandwidth calculation gives substantially different values for univariate data. For test data, the updated functions give a smaller bandwidth value compared to KPDF.

class KDEParameters(kdeType)

Bases: object

Generates the probability density functions (using kernel density estimation) of given cyclone parameters (speed, pressure, bearing, etc). Each of these PDF’s is converted to a cumulative density function for use in other sections.

Parameters

kdeType (str) – Name of the (univariate) kernel estimator to use when generating the distribution. Must be one of Epanechnikov, Gaussian, Biweight or Triangular.

generateGenesisDateCDF(genDays, lonLat, bw=None, genesisKDE=None)

Calculate the PDF of genesis day using KDEs. Since the data is periodic, we use a simple method to include the periodicity in estimating the PDF. We prepend and append the data to itself, then use the central third of the PDF and multiply by three to obtain the required PDF. Probably not quite exact, but it should be sufficient for our purposes.

Parameters
  • genDays (str) – Name of file containing genesis days (as day of year).

  • lonLat (numpy.ndarray) – Array of genesis longitudes and latitudes.

  • bw (float) – Optional. Bandwidth of the KDE to use.

  • genesisKDE (str) – Optional. File name to save resulting CDF to.

Returns

numpy.ndarray containing the days, the PDF and CDF of the genesis days.

generateKDE(parameters, kdeStep, kdeParameters=None, cdfParameters=None, angular=False, periodic=False, missingValue=9223372036854775807)

Generate a PDF and CDF for a given parameter set using the method of kernel density estimators. Optionally return the PDF and CDF as an array, or write both to separate files.

Parameters
  • parameters – Parameter values. If a string is given, then it is the path to a file containing the values. If an array is passed, then it should hold the parameter values.

  • kdeStep (float, default=`0.1`) – Increment of the ordinate values at which the distributions will be calculated.

  • kdeParameters (str) – Optional. If given, then the cell distributions will be saved to a file with this name. If absent, the distribution values are returned.

  • cdfParameters (str) – Optional. If given, then the cell distributions will be saved to a file with this name. If absent, the distribution values are returned.

  • angular (boolean, default=``False``) – Does the data represent an angular measure (e.g. bearing).

  • periodic (boolean or int, default=``False``) – Does the data represent some form of periodic data (e.g. day of year). If given, it should be the period of the data (e.g. for annual data, periodic=365).

  • missingValue – Missing values have this value (default sys.maxint).

returns: If kdeParameters is given, returns None

(data are saved to file), otherwise numpy.ndarray of the parameter grid, the PDF and CDF.

StatInterface.SamplingOrigin module

SamplingOrigin – Generate random TC origins

Define the class for sampling tropical cyclone origins.

class SamplingOrigin(kdeOrigin=None, x=None, y=None)

Bases: object

Class for generating samples of TC origins.

Parameters
  • kdeOrigin (str or numpy.ndarray) – Name of a file containing TC genesis PDF data, or a 2-d array containing the PDF.

  • x (numpy.ndarray) – Longitude coordinates of the grid on which the PDF is defined.

  • y (numpy.ndarray) – Latitude coordinates of the grid on which the PDF is defined.

cdf(xx, yy)

Return CDF value at the given location.

Parameters
  • xx (float) – x-ccordinate.

  • yy (float) – y-coordinate.

Returns

CDF values for x & y at the given location.

generateOneSample()

Generate a random cyclone origin.

generateSamples(ns, outputFile=None)

Generate random samples of cyclone origins.

Parameters
  • ns (int) – Number of samples to generate.

  • outputFile (str) – If given, save the samples to the file.

Returns

numpy.ndarray containing longitude and latitude of a random sample of TC origins.

Raises
  • ValueError – If ns <= 0.

  • IndexError – If an invalid index is returned when generating uniform random values.

ppf(q1, q2)

Percent point function on 2-d grid (inverse of CDF).

Parameters
  • q1 (float) – Quantile for the x-coordinate.

  • q2 (float) – Quantile for the y-coordinate.

Returns

Longitude & latitude of the given quantile values.

setKDEOrigins(kdeOriginX=None, kdeOriginY=None, kdeOriginZ=None, outputPath=None)

Set kernel density estimation origin parameters.

Parameters
  • kdeOriginX (str or numpy.ndarray) – x coordinates of kde result generated from KDEOrigin

  • kdeOriginY (str or numpy.ndarray) – y coordinates of kde result generated from KDEOrigin

  • kdeOriginZ (str or numpy.ndarray) – z coordinates of kde result generated from KDEOrigin

  • outputPath (str) – Path to output folder to load PDF file.

StatInterface.SamplingParameters module

SamplingParameters – Sample TC parameters from distributions

Defines the class for sampling cyclone parameters. Can generate either a single sample, or an array of samples (for multiple cyclones).

class SamplingParameters(cdfParameters=None)

Bases: object

Provides methods to sample one or many values from a CDF of parameter values.

Parameters

cdfParameters – Name of a file containing the CDF of a parameter, or the actual CDF values.

generateOneSample()

Generate a single random sample of cyclone parameters.

generateSamples(ns, sample_parameter_path=None)

Generate random samples of cyclone initial parameters.

Parameters
  • ns (int) – Number of samples to generate.

  • sample_parameter_path (str) – Path to a file to save the sampled parameter values to.

Returns

The sample values.

Raises

ValueError – If ns <= 0.

setParameters(cdfParameters)

Set parameters.

Parameters

cdfParameters – Name of a file containing the CDF of a parameter, or the actual CDF values.

Raises

IOError – If the CDF files do not exist.

StatInterface.StatInterface module

StatInterface – statistical analysis of input datasets

class StatInterface(configFile, autoCalc_gridLimit=None, progressbar=None)

Bases: object

Main interface to the statistical analysis module of TCRM. This module generates cumulative distribution functions and probability density functions of the various parameters, largely using kernel density estimation methods.

Parameters
  • configFile (str) – Path to configuration file.

  • autoCalc_gridLimit – function to calculate the extent of a domain.

  • progressBar – a SimpleProgressBar() object to print progress to STDOUT.

calcCellStatistics(minSample=100)

Calculate the cell statistics for speed, bearing, pressure, and pressure rate of change for all the grid cells in the domain.

The statistics calculated are mean, variance, and autocorrelation.

The cell statistics are calculated on a grid defined by gridLimit, gridSpace and gridInc using an instance of StatInterface.generateStats.GenerateStats.

An optional minSample (default=100) can be given which sets the minimum number of observations in a given cell to calculate the statistics.

cdfCellBearing()

Generate CDFs relating to the bearing of cyclones for each grid cell in the model domain.

cdfCellPressure()

Generate CDFs relating to the pressures of cyclones in each grid cell in the model domain.

cdfCellSize()

Generate CDFs relating to the size (radius of maximum wind) of cyclones in each grid cell in the model domain.

cdfCellSpeed()

Generate CDFs relating to the speed of motion of cyclones for each grid cell in the model domain.

kdeGenesisDate()

Generate CDFs relating to the genesis day-of-year of cyclones for each grid cell in teh model domain.

kdeOrigin()

Generate 2D PDFs relating to the origin of cyclones.

StatInterface.circularKDE module

StatInterface.generateStats module

generateStats – calculation of statistical values

class GenerateStats(parameter, lonLat, gridLimit, gridSpace, gridInc, minSample=100, angular=False, missingValue=9223372036854775807, progressbar=None, prgStartValue=0, prgEndValue=1, calculateLater=False)

Bases: object

Generate the main statistical distributions across the grid domain.

Parameters
  • parameter (numpy.ndarray or str) – contains the data on which the statistical values will be based. If str, then represents the name of a file that contains the data

  • lonLat (numpy.ndarray or str) – Contains the longitude and latitude of each of the observations in the numpy.ndarray parameter

  • gridLimit (dict) – dictionary containing limits of regional grid Contains keys ‘xMin’, ‘xMax’, ‘yMin’, ‘yMax’

  • gridLimit – The bounds of the model domain. The dict should contain the keys xMin, xMax, yMin and yMax. The x variable bounds the longitude and the y variable bounds the latitude.

  • gridSpace (dict) – The default grid cell size. The dict should contain keys of x and y. The x variable defines the longitudinal grid size, and the y variable defines the latitudinal size.

  • gridInc (dict) – The increment in grid size, for those cells that do not contain sufficient observations for generating distributions. The dict should contain the keys x and y. The x variable defines the longitudinal grid increment, and the y variable defines the latitudinal increment.

  • minSample (int) – Minimum number of valid observations required to generate distributions. If insufficient observations are found in a grid cell, then it is incrementally expanded until minSample is reached.

  • angular (boolean) – If True the data represents an angular variable (e.g. bearings). Default is False.

calculate(cellNum, onLand)

Calculate the required statistics (mean, variance, autocorrelation and regularized anomaly coefficient) for the given cell.

Parameters
  • cellNum (int) – The cell number to process.

  • onLand (boolean) – If True, then the cell is (mostly or entirely) over land. If False, the cell is over water.

Returns

mean, standard deviation, autocorrelation, residual correlation and the minimum parameter value.

calculateStatistics()

Cycle through the cells and calculate the statistics for the variable.

extractParameter(cellNum, onLand)

Extracts the cyclone parameter data for the given cell. If the population of a cell is insufficient for generating a PDF, the bounds of the cell are expanded until the population is sufficient.

Null/missing values are removed.

Parameters

cellNum (int) – The cell number to process.

Returns

None. The parameter attribute is updated.

Raises

IndexError – if the cell number is not valid (i.e. if it is outside the possible range of cell numbers).

load(filename)

Load pre-calculated statistics from a netcdf file.

Parameters

filename (str) – Path to the netcdf-format file containing the statistics.

plotStatistics(output_file)
save(filename, description='')

Save parameters to a netcdf file for later access.

Parameters
  • filename (str) – Path to the netcdf file to be created.

  • description (str) – Name of the parameter.

acf(p, nlags=1)

Autocorrelation coefficient

Parameters

p (1-d numpy.ndarray) – array of values to calculate autocorrelation coefficient

class parameters(numCells)

Bases: object

Description: Create an object that holds numpy.ndarray`s of the statistical properties of each grid cell. There are :class:`numpy.ndarray for both land and sea cells.

mu/lmu : array of mean values of parameter for each grid cell sig/lsig : array of variance values of parameter for each grid cell alpha/lalpha : array of autoregression coefficients of parameter for each grid cell phi/lphi : array of normalisation values for random variations

Module contents