StatInterface package¶
Submodules¶
StatInterface.GenerateDistributions module¶
GenerateDistributions – generate distributions of parameters¶
Generate the cumulative distribution functions (CDF’s) for a given parameter for each cell in the lat-lon grid (defined by gridLimit and gridSpace). This uses the method of kernel density estimators to determine the distributions.
- 
class GenerateDistributions(configFile, gridLimit, gridSpace, gridInc, kdeType, minSamplesCell=40, missingValue=9223372036854775807)¶
- Bases: - object- Generate the cumulative distribution functions (CDF’s) for a given parameter for each cell in the lat-lon grid (defined by gridLimit and gridSpace). This uses the method of kernel density estimators to determine the distributions. The methods allow for extraction of paramter values from the parameter files created in - DataProcess.DataProcess, and calculation (and saving) distributions.- Parameters
- configFile (str) – Path to configuration file. 
- gridLimit (dict) – The bounds of the model domain. The - dictshould contain the keys- xMin,- xMax,- yMinand- yMax. The x variable bounds the longitude and the y variable bounds the latitude.
- gridSpace (dict) – The default grid cell size. The - dictshould contain keys of- xand- y. The x variable defines the longitudinal grid size, and the y variable defines the latitudinal size.
- gridInc (dict) – The increment in grid size, for those cells that do not contain sufficient observations for generating distributions. The - dictshould contain the keys- xand- y. The x variable defines the longitudinal grid increment, and the y variable defines the latitudinal increment.
- kdeType (str) – Name of the (univariate) kernel estimator to use when generating the distribution. Must be one of - Epanechnikov,- Gaussian,- Biweightor- Triangular.
- minSamplesCell (int) – Minimum number of valid observations required to generate distributions. If insufficient observations are found in a grid cell, then it is incrementally expanded until - minSamplesCellis reached.
- missingValue – Missing values have this value (default - sys.maxint).
 
 - 
allDistributions(lonLat, parameterList, parameterName=None, kdeStep=0.1, angular=False, periodic=False, plotParam=False)¶
- Calculate a distribution for each individual cell and store in a file or return the distribution. - Parameters
- lonLat (str or - numpy.ndarray) – The longitude/latitude of all observations in the model domain. If a string is given, then it is the path to a file containing the longitude/latitude information. If an array is given, then it should be a 2-d array containing the data values.
- parameterList (str or - numpy.ndarray) – Parameter values. If a string is given, then it is the path to a file containing the values. If an array is passed, then it should hold the parameter values.
- parameterName (str) – Optional. If given, then the cell distributions will be saved to a file with this name. If absent, the distribution values are returned. 
- kdeStep (float, default=`0.1`) – Increment of the ordinate values at which the distributions will be calculated. 
- angular (boolean, default=``False``) – Does the data represent an angular measure (e.g. bearing). 
- periodic (boolean or float, default=``False``) – Does the data represent some form of periodic data (e.g. day of year). If given, it should be the period of the data (e.g. for annual data, - periodic=365).
- plotParam (boolean) – Plot the parameters. Default is - False.
 
- Returns
- If no - parameterNameis given returns- None(data are saved to file), otherwise- numpy.ndarray.
 
 - 
extractParameter(cellNum)¶
- Extracts the cyclone parameter data for the given cell. If the population of a cell is insufficient for generating a PDF, the bounds of the cell are expanded until the population is sufficient. - Null/missing values are removed. - Parameters
- cellNum (int) – The cell number to process. 
- Returns
- None. The - parameterattribute is updated.
- Raises
- IndexError – if the cell number is not valid (i.e. if it is outside the possible range of cell numbers). 
 
 
StatInterface.KDEOrigin module¶
KDEOrigin – kernel density estimation for genesis probability¶
Calculate a genesis probability distribution, based on the observed genesis locations and applying a 2-d kernel density estimation method.
- 
class KDEOrigin(configFile, gridLimit, kdeStep, lonLat=None, progressbar=None)¶
- Bases: - object- Initialise the class for generating the genesis probability distribution. Initialisation will load the required data (genesis locations) and calculate the optimum bandwidth for the kernel density method. - Parameters
- configFile (str) – - Path to the configuration file. :param dict gridLimit: The bounds of the model domain. The - dictshould contain the keys- xMin,- xMax,- yMinand- yMax. The x variable bounds the longitude and the y variable bounds the latitude.
- kdeStep (float) – Increment of the ordinate values at which the distributions will be calculated. Default=`0.1` 
- lonLat ( - numpy.ndarray) – If given, a 2-d array of the longitude and latitude of genesis locations. If not given, attempt to load an- init_lon_latfile from the processed files.
- progressbar ( - Utilities.progressbarobject.) – A- SimpleProgressBar()object to print progress to STDOUT.
 
 - 
generateCdf(save=False)¶
- Generate the CDFs corresponding to PDFs of cyclone origins, then save it on a file path provided by user - Parameters
- save (boolean) – If - True, save the CDF to a netcdf file called ‘originCDF.nc’. If- False, return the CDF.
 
 - 
generateKDE(save=False, plot=False)¶
- Generate the PDF for cyclone origins using kernel density estimation technique then save it to a file path provided by user. - Parameters
- bw (float) – Optional, bandwidth to use for generating the PDF. If not specified, use the - bwattribute.
- save (boolean) – If - True, save the resulting PDF to a netCDF file called ‘originPDF.nc’.
- plot (boolean) – If - True, plot the resulting PDF.
 
- Returns
- xand- ygrid and the PDF values.
 
 - 
updateProgressBar(step, stepMax)¶
- Callback function to update progress bar from C code - Parameters
- n (int) – Current step. 
- nMax (int) – Maximum step. 
 
 
 
- 
getOriginBandwidth(data)¶
- Calculate the optimal bandwidth for kernel density estimation from data. - Parameters
- data – - numpy.ndarrayof data points for training data
- Returns
- Bandwidth parameter. 
 
StatInterface.KDEParameters module¶
KDEParameters – generate KDE of cyclone parameters¶
Generates the probability density functions (using kernel density estimation) of given cyclone parameters (speed, pressure, bearing, etc). Each of these PDF’s is converted to a cumulative density function for use in other sections.
Note
In changing from the previous KPDF module to statsmodels, the bandwidth calculation gives substantially different values for univariate data. For test data, the updated functions give a smaller bandwidth value compared to KPDF.
- 
class KDEParameters(kdeType)¶
- Bases: - object- Generates the probability density functions (using kernel density estimation) of given cyclone parameters (speed, pressure, bearing, etc). Each of these PDF’s is converted to a cumulative density function for use in other sections. - Parameters
- kdeType (str) – Name of the (univariate) kernel estimator to use when generating the distribution. Must be one of - Epanechnikov,- Gaussian,- Biweightor- Triangular.
 - 
generateGenesisDateCDF(genDays, lonLat, bw=None, genesisKDE=None)¶
- Calculate the PDF of genesis day using KDEs. Since the data is periodic, we use a simple method to include the periodicity in estimating the PDF. We prepend and append the data to itself, then use the central third of the PDF and multiply by three to obtain the required PDF. Probably not quite exact, but it should be sufficient for our purposes. - Parameters
- genDays (str) – Name of file containing genesis days (as day of year). 
- lonLat ( - numpy.ndarray) – Array of genesis longitudes and latitudes.
- bw (float) – Optional. Bandwidth of the KDE to use. 
- genesisKDE (str) – Optional. File name to save resulting CDF to. 
 
- Returns
- numpy.ndarraycontaining the days, the PDF and CDF of the genesis days.
 
 - 
generateKDE(parameters, kdeStep, kdeParameters=None, cdfParameters=None, angular=False, periodic=False, missingValue=9223372036854775807)¶
- Generate a PDF and CDF for a given parameter set using the method of kernel density estimators. Optionally return the PDF and CDF as an array, or write both to separate files. - Parameters
- parameters – Parameter values. If a string is given, then it is the path to a file containing the values. If an array is passed, then it should hold the parameter values. 
- kdeStep (float, default=`0.1`) – Increment of the ordinate values at which the distributions will be calculated. 
- kdeParameters (str) – Optional. If given, then the cell distributions will be saved to a file with this name. If absent, the distribution values are returned. 
- cdfParameters (str) – Optional. If given, then the cell distributions will be saved to a file with this name. If absent, the distribution values are returned. 
- angular (boolean, default=``False``) – Does the data represent an angular measure (e.g. bearing). 
- periodic (boolean or int, default=``False``) – Does the data represent some form of periodic data (e.g. day of year). If given, it should be the period of the data (e.g. for annual data, - periodic=365).
- missingValue – Missing values have this value (default - sys.maxint).
 
 - returns: If kdeParametersis given, returnsNone
- (data are saved to file), otherwise - numpy.ndarrayof the parameter grid, the PDF and CDF.
 
 
StatInterface.SamplingOrigin module¶
SamplingOrigin – Generate random TC origins¶
Define the class for sampling tropical cyclone origins.
- 
class SamplingOrigin(kdeOrigin=None, x=None, y=None)¶
- Bases: - object- Class for generating samples of TC origins. - Parameters
- kdeOrigin (str or - numpy.ndarray) – Name of a file containing TC genesis PDF data, or a 2-d array containing the PDF.
- x ( - numpy.ndarray) – Longitude coordinates of the grid on which the PDF is defined.
- y ( - numpy.ndarray) – Latitude coordinates of the grid on which the PDF is defined.
 
 - 
cdf(xx, yy)¶
- Return CDF value at the given location. - Parameters
- xx (float) – x-ccordinate. 
- yy (float) – y-coordinate. 
 
- Returns
- CDF values for x & y at the given location. 
 
 - 
generateOneSample()¶
- Generate a random cyclone origin. 
 - 
generateSamples(ns, outputFile=None)¶
- Generate random samples of cyclone origins. - Parameters
- ns (int) – Number of samples to generate. 
- outputFile (str) – If given, save the samples to the file. 
 
- Returns
- numpy.ndarraycontaining longitude and latitude of a random sample of TC origins.
- Raises
- ValueError – If - ns<= 0.
- IndexError – If an invalid index is returned when generating uniform random values. 
 
 
 - 
ppf(q1, q2)¶
- Percent point function on 2-d grid (inverse of CDF). - Parameters
- q1 (float) – Quantile for the x-coordinate. 
- q2 (float) – Quantile for the y-coordinate. 
 
- Returns
- Longitude & latitude of the given quantile values. 
 
 - 
setKDEOrigins(kdeOriginX=None, kdeOriginY=None, kdeOriginZ=None, outputPath=None)¶
- Set kernel density estimation origin parameters. - Parameters
- kdeOriginX (str or - numpy.ndarray) – x coordinates of kde result generated from- KDEOrigin
- kdeOriginY (str or - numpy.ndarray) – y coordinates of kde result generated from- KDEOrigin
- kdeOriginZ (str or - numpy.ndarray) – z coordinates of kde result generated from- KDEOrigin
- outputPath (str) – Path to output folder to load PDF file. 
 
 
 
StatInterface.SamplingParameters module¶
SamplingParameters – Sample TC parameters from distributions¶
Defines the class for sampling cyclone parameters. Can generate either a single sample, or an array of samples (for multiple cyclones).
- 
class SamplingParameters(cdfParameters=None)¶
- Bases: - object- Provides methods to sample one or many values from a CDF of parameter values. - Parameters
- cdfParameters – Name of a file containing the CDF of a parameter, or the actual CDF values. 
 - 
generateOneSample()¶
- Generate a single random sample of cyclone parameters. 
 - 
generateSamples(ns, sample_parameter_path=None)¶
- Generate random samples of cyclone initial parameters. - Parameters
- ns (int) – Number of samples to generate. 
- sample_parameter_path (str) – Path to a file to save the sampled parameter values to. 
 
- Returns
- The sample values. 
- Raises
- ValueError – If ns <= 0. 
 
 - 
setParameters(cdfParameters)¶
- Set parameters. - Parameters
- cdfParameters – Name of a file containing the CDF of a parameter, or the actual CDF values. 
- Raises
- IOError – If the CDF files do not exist. 
 
 
StatInterface.StatInterface module¶
StatInterface – statistical analysis of input datasets¶
- 
class StatInterface(configFile, autoCalc_gridLimit=None, progressbar=None)¶
- Bases: - object- Main interface to the statistical analysis module of TCRM. This module generates cumulative distribution functions and probability density functions of the various parameters, largely using kernel density estimation methods. - Parameters
- configFile (str) – Path to configuration file. 
- autoCalc_gridLimit – function to calculate the extent of a domain. 
- progressBar – a - SimpleProgressBar()object to print progress to STDOUT.
 
 - 
calcCellStatistics(minSample=100)¶
- Calculate the cell statistics for speed, bearing, pressure, and pressure rate of change for all the grid cells in the domain. - The statistics calculated are mean, variance, and autocorrelation. - The cell statistics are calculated on a grid defined by - gridLimit,- gridSpaceand- gridIncusing an instance of- StatInterface.generateStats.GenerateStats.- An optional - minSample(default=100) can be given which sets the minimum number of observations in a given cell to calculate the statistics.
 - 
cdfCellBearing()¶
- Generate CDFs relating to the bearing of cyclones for each grid cell in the model domain. 
 - 
cdfCellPressure()¶
- Generate CDFs relating to the pressures of cyclones in each grid cell in the model domain. 
 - 
cdfCellSize()¶
- Generate CDFs relating to the size (radius of maximum wind) of cyclones in each grid cell in the model domain. 
 - 
cdfCellSpeed()¶
- Generate CDFs relating to the speed of motion of cyclones for each grid cell in the model domain. 
 - 
kdeGenesisDate()¶
- Generate CDFs relating to the genesis day-of-year of cyclones for each grid cell in teh model domain. 
 - 
kdeOrigin()¶
- Generate 2D PDFs relating to the origin of cyclones. 
 
StatInterface.circularKDE module¶
StatInterface.generateStats module¶
generateStats – calculation of statistical values¶
- 
class GenerateStats(parameter, lonLat, gridLimit, gridSpace, gridInc, minSample=100, angular=False, missingValue=9223372036854775807, progressbar=None, prgStartValue=0, prgEndValue=1, calculateLater=False)¶
- Bases: - object- Generate the main statistical distributions across the grid domain. - Parameters
- parameter ( - numpy.ndarrayor str) – contains the data on which the statistical values will be based. If str, then represents the name of a file that contains the data
- lonLat ( - numpy.ndarrayor str) – Contains the longitude and latitude of each of the observations in the- numpy.ndarrayparameter
- gridLimit (dict) – dictionary containing limits of regional grid Contains keys ‘xMin’, ‘xMax’, ‘yMin’, ‘yMax’ 
- gridLimit – The bounds of the model domain. The - dictshould contain the keys- xMin,- xMax,- yMinand- yMax. The x variable bounds the longitude and the y variable bounds the latitude.
- gridSpace (dict) – The default grid cell size. The - dictshould contain keys of- xand- y. The x variable defines the longitudinal grid size, and the y variable defines the latitudinal size.
- gridInc (dict) – The increment in grid size, for those cells that do not contain sufficient observations for generating distributions. The - dictshould contain the keys- xand- y. The x variable defines the longitudinal grid increment, and the y variable defines the latitudinal increment.
- minSample (int) – Minimum number of valid observations required to generate distributions. If insufficient observations are found in a grid cell, then it is incrementally expanded until - minSampleis reached.
- angular (boolean) – If - Truethe data represents an angular variable (e.g. bearings). Default is- False.
 
 - 
calculate(cellNum, onLand)¶
- Calculate the required statistics (mean, variance, autocorrelation and regularized anomaly coefficient) for the given cell. - Parameters
- cellNum (int) – The cell number to process. 
- onLand (boolean) – If - True, then the cell is (mostly or entirely) over land. If- False, the cell is over water.
 
- Returns
- mean, standard deviation, autocorrelation, residual correlation and the minimum parameter value. 
 
 - 
calculateStatistics()¶
- Cycle through the cells and calculate the statistics for the variable. 
 - 
extractParameter(cellNum, onLand)¶
- Extracts the cyclone parameter data for the given cell. If the population of a cell is insufficient for generating a PDF, the bounds of the cell are expanded until the population is sufficient. - Null/missing values are removed. - Parameters
- cellNum (int) – The cell number to process. 
- Returns
- None. The - parameterattribute is updated.
- Raises
- IndexError – if the cell number is not valid (i.e. if it is outside the possible range of cell numbers). 
 
 - 
load(filename)¶
- Load pre-calculated statistics from a netcdf file. - Parameters
- filename (str) – Path to the netcdf-format file containing the statistics. 
 
 - 
plotStatistics(output_file)¶
 - 
save(filename, description='')¶
- Save parameters to a netcdf file for later access. - Parameters
- filename (str) – Path to the netcdf file to be created. 
- description (str) – Name of the parameter. 
 
 
 
- 
acf(p, nlags=1)¶
- Autocorrelation coefficient - Parameters
- p (1-d - numpy.ndarray) – array of values to calculate autocorrelation coefficient
 
- 
class parameters(numCells)¶
- Bases: - object- Description: Create an object that holds - numpy.ndarray`s of the statistical properties of each grid cell. There are :class:`numpy.ndarrayfor both land and sea cells.- mu/lmu : array of mean values of parameter for each grid cell sig/lsig : array of variance values of parameter for each grid cell alpha/lalpha : array of autoregression coefficients of parameter for each grid cell phi/lphi : array of normalisation values for random variations