StatInterface package¶
Submodules¶
StatInterface.GenerateDistributions module¶
GenerateDistributions
– generate distributions of parameters¶
Generate the cumulative distribution functions (CDF’s) for a given parameter for each cell in the lat-lon grid (defined by gridLimit and gridSpace). This uses the method of kernel density estimators to determine the distributions.
-
class
GenerateDistributions
(configFile, gridLimit, gridSpace, gridInc, kdeType, minSamplesCell=40, missingValue=9223372036854775807)¶ Bases:
object
Generate the cumulative distribution functions (CDF’s) for a given parameter for each cell in the lat-lon grid (defined by gridLimit and gridSpace). This uses the method of kernel density estimators to determine the distributions. The methods allow for extraction of paramter values from the parameter files created in
DataProcess.DataProcess
, and calculation (and saving) distributions.- Parameters
configFile (str) – Path to configuration file.
gridLimit (dict) – The bounds of the model domain. The
dict
should contain the keysxMin
,xMax
,yMin
andyMax
. The x variable bounds the longitude and the y variable bounds the latitude.gridSpace (dict) – The default grid cell size. The
dict
should contain keys ofx
andy
. The x variable defines the longitudinal grid size, and the y variable defines the latitudinal size.gridInc (dict) – The increment in grid size, for those cells that do not contain sufficient observations for generating distributions. The
dict
should contain the keysx
andy
. The x variable defines the longitudinal grid increment, and the y variable defines the latitudinal increment.kdeType (str) – Name of the (univariate) kernel estimator to use when generating the distribution. Must be one of
Epanechnikov
,Gaussian
,Biweight
orTriangular
.minSamplesCell (int) – Minimum number of valid observations required to generate distributions. If insufficient observations are found in a grid cell, then it is incrementally expanded until
minSamplesCell
is reached.missingValue – Missing values have this value (default
sys.maxint
).
-
allDistributions
(lonLat, parameterList, parameterName=None, kdeStep=0.1, angular=False, periodic=False, plotParam=False)¶ Calculate a distribution for each individual cell and store in a file or return the distribution.
- Parameters
lonLat (str or
numpy.ndarray
) – The longitude/latitude of all observations in the model domain. If a string is given, then it is the path to a file containing the longitude/latitude information. If an array is given, then it should be a 2-d array containing the data values.parameterList (str or
numpy.ndarray
) – Parameter values. If a string is given, then it is the path to a file containing the values. If an array is passed, then it should hold the parameter values.parameterName (str) – Optional. If given, then the cell distributions will be saved to a file with this name. If absent, the distribution values are returned.
kdeStep (float, default=`0.1`) – Increment of the ordinate values at which the distributions will be calculated.
angular (boolean, default=``False``) – Does the data represent an angular measure (e.g. bearing).
periodic (boolean or float, default=``False``) – Does the data represent some form of periodic data (e.g. day of year). If given, it should be the period of the data (e.g. for annual data,
periodic=365
).plotParam (boolean) – Plot the parameters. Default is
False
.
- Returns
If no
parameterName
is given returnsNone
(data are saved to file), otherwisenumpy.ndarray
.
-
extractParameter
(cellNum)¶ Extracts the cyclone parameter data for the given cell. If the population of a cell is insufficient for generating a PDF, the bounds of the cell are expanded until the population is sufficient.
Null/missing values are removed.
- Parameters
cellNum (int) – The cell number to process.
- Returns
None. The
parameter
attribute is updated.- Raises
IndexError – if the cell number is not valid (i.e. if it is outside the possible range of cell numbers).
StatInterface.KDEOrigin module¶
KDEOrigin
– kernel density estimation for genesis probability¶
Calculate a genesis probability distribution, based on the observed genesis locations and applying a 2-d kernel density estimation method.
-
class
KDEOrigin
(configFile, gridLimit, kdeStep, lonLat=None, progressbar=None)¶ Bases:
object
Initialise the class for generating the genesis probability distribution. Initialisation will load the required data (genesis locations) and calculate the optimum bandwidth for the kernel density method.
- Parameters
configFile (str) –
Path to the configuration file. :param dict gridLimit: The bounds of the model domain. The
dict
should contain the keysxMin
,xMax
,yMin
andyMax
. The x variable bounds the longitude and the y variable bounds the latitude.kdeStep (float) – Increment of the ordinate values at which the distributions will be calculated. Default=`0.1`
lonLat (
numpy.ndarray
) – If given, a 2-d array of the longitude and latitude of genesis locations. If not given, attempt to load aninit_lon_lat
file from the processed files.progressbar (
Utilities.progressbar
object.) – ASimpleProgressBar()
object to print progress to STDOUT.
-
generateCdf
(save=False)¶ Generate the CDFs corresponding to PDFs of cyclone origins, then save it on a file path provided by user
- Parameters
save (boolean) – If
True
, save the CDF to a netcdf file called ‘originCDF.nc’. IfFalse
, return the CDF.
-
generateKDE
(save=False, plot=False)¶ Generate the PDF for cyclone origins using kernel density estimation technique then save it to a file path provided by user.
- Parameters
bw (float) – Optional, bandwidth to use for generating the PDF. If not specified, use the
bw
attribute.save (boolean) – If
True
, save the resulting PDF to a netCDF file called ‘originPDF.nc’.plot (boolean) – If
True
, plot the resulting PDF.
- Returns
x
andy
grid and the PDF values.
-
updateProgressBar
(step, stepMax)¶ Callback function to update progress bar from C code
- Parameters
n (int) – Current step.
nMax (int) – Maximum step.
-
getOriginBandwidth
(data)¶ Calculate the optimal bandwidth for kernel density estimation from data.
- Parameters
data –
numpy.ndarray
of data points for training data- Returns
Bandwidth parameter.
StatInterface.KDEParameters module¶
KDEParameters
– generate KDE of cyclone parameters¶
Generates the probability density functions (using kernel density estimation) of given cyclone parameters (speed, pressure, bearing, etc). Each of these PDF’s is converted to a cumulative density function for use in other sections.
Note
In changing from the previous KPDF module to statsmodels, the bandwidth calculation gives substantially different values for univariate data. For test data, the updated functions give a smaller bandwidth value compared to KPDF.
-
class
KDEParameters
(kdeType)¶ Bases:
object
Generates the probability density functions (using kernel density estimation) of given cyclone parameters (speed, pressure, bearing, etc). Each of these PDF’s is converted to a cumulative density function for use in other sections.
- Parameters
kdeType (str) – Name of the (univariate) kernel estimator to use when generating the distribution. Must be one of
Epanechnikov
,Gaussian
,Biweight
orTriangular
.
-
generateGenesisDateCDF
(genDays, lonLat, bw=None, genesisKDE=None)¶ Calculate the PDF of genesis day using KDEs. Since the data is periodic, we use a simple method to include the periodicity in estimating the PDF. We prepend and append the data to itself, then use the central third of the PDF and multiply by three to obtain the required PDF. Probably not quite exact, but it should be sufficient for our purposes.
- Parameters
genDays (str) – Name of file containing genesis days (as day of year).
lonLat (
numpy.ndarray
) – Array of genesis longitudes and latitudes.bw (float) – Optional. Bandwidth of the KDE to use.
genesisKDE (str) – Optional. File name to save resulting CDF to.
- Returns
numpy.ndarray
containing the days, the PDF and CDF of the genesis days.
-
generateKDE
(parameters, kdeStep, kdeParameters=None, cdfParameters=None, angular=False, periodic=False, missingValue=9223372036854775807)¶ Generate a PDF and CDF for a given parameter set using the method of kernel density estimators. Optionally return the PDF and CDF as an array, or write both to separate files.
- Parameters
parameters – Parameter values. If a string is given, then it is the path to a file containing the values. If an array is passed, then it should hold the parameter values.
kdeStep (float, default=`0.1`) – Increment of the ordinate values at which the distributions will be calculated.
kdeParameters (str) – Optional. If given, then the cell distributions will be saved to a file with this name. If absent, the distribution values are returned.
cdfParameters (str) – Optional. If given, then the cell distributions will be saved to a file with this name. If absent, the distribution values are returned.
angular (boolean, default=``False``) – Does the data represent an angular measure (e.g. bearing).
periodic (boolean or int, default=``False``) – Does the data represent some form of periodic data (e.g. day of year). If given, it should be the period of the data (e.g. for annual data,
periodic=365
).missingValue – Missing values have this value (default
sys.maxint
).
- returns: If
kdeParameters
is given, returnsNone
(data are saved to file), otherwise
numpy.ndarray
of the parameter grid, the PDF and CDF.
StatInterface.SamplingOrigin module¶
SamplingOrigin
– Generate random TC origins¶
Define the class for sampling tropical cyclone origins.
-
class
SamplingOrigin
(kdeOrigin=None, x=None, y=None)¶ Bases:
object
Class for generating samples of TC origins.
- Parameters
kdeOrigin (str or
numpy.ndarray
) – Name of a file containing TC genesis PDF data, or a 2-d array containing the PDF.x (
numpy.ndarray
) – Longitude coordinates of the grid on which the PDF is defined.y (
numpy.ndarray
) – Latitude coordinates of the grid on which the PDF is defined.
-
cdf
(xx, yy)¶ Return CDF value at the given location.
- Parameters
xx (float) – x-ccordinate.
yy (float) – y-coordinate.
- Returns
CDF values for x & y at the given location.
-
generateOneSample
()¶ Generate a random cyclone origin.
-
generateSamples
(ns, outputFile=None)¶ Generate random samples of cyclone origins.
- Parameters
ns (int) – Number of samples to generate.
outputFile (str) – If given, save the samples to the file.
- Returns
numpy.ndarray
containing longitude and latitude of a random sample of TC origins.- Raises
ValueError – If
ns
<= 0.IndexError – If an invalid index is returned when generating uniform random values.
-
ppf
(q1, q2)¶ Percent point function on 2-d grid (inverse of CDF).
- Parameters
q1 (float) – Quantile for the x-coordinate.
q2 (float) – Quantile for the y-coordinate.
- Returns
Longitude & latitude of the given quantile values.
-
setKDEOrigins
(kdeOriginX=None, kdeOriginY=None, kdeOriginZ=None, outputPath=None)¶ Set kernel density estimation origin parameters.
- Parameters
kdeOriginX (str or
numpy.ndarray
) – x coordinates of kde result generated fromKDEOrigin
kdeOriginY (str or
numpy.ndarray
) – y coordinates of kde result generated fromKDEOrigin
kdeOriginZ (str or
numpy.ndarray
) – z coordinates of kde result generated fromKDEOrigin
outputPath (str) – Path to output folder to load PDF file.
StatInterface.SamplingParameters module¶
SamplingParameters
– Sample TC parameters from distributions¶
Defines the class for sampling cyclone parameters. Can generate either a single sample, or an array of samples (for multiple cyclones).
-
class
SamplingParameters
(cdfParameters=None)¶ Bases:
object
Provides methods to sample one or many values from a CDF of parameter values.
- Parameters
cdfParameters – Name of a file containing the CDF of a parameter, or the actual CDF values.
-
generateOneSample
()¶ Generate a single random sample of cyclone parameters.
-
generateSamples
(ns, sample_parameter_path=None)¶ Generate random samples of cyclone initial parameters.
- Parameters
ns (int) – Number of samples to generate.
sample_parameter_path (str) – Path to a file to save the sampled parameter values to.
- Returns
The sample values.
- Raises
ValueError – If ns <= 0.
-
setParameters
(cdfParameters)¶ Set parameters.
- Parameters
cdfParameters – Name of a file containing the CDF of a parameter, or the actual CDF values.
- Raises
IOError – If the CDF files do not exist.
StatInterface.StatInterface module¶
StatInterface
– statistical analysis of input datasets¶
-
class
StatInterface
(configFile, autoCalc_gridLimit=None, progressbar=None)¶ Bases:
object
Main interface to the statistical analysis module of TCRM. This module generates cumulative distribution functions and probability density functions of the various parameters, largely using kernel density estimation methods.
- Parameters
configFile (str) – Path to configuration file.
autoCalc_gridLimit – function to calculate the extent of a domain.
progressBar – a
SimpleProgressBar()
object to print progress to STDOUT.
-
calcCellStatistics
(minSample=100)¶ Calculate the cell statistics for speed, bearing, pressure, and pressure rate of change for all the grid cells in the domain.
The statistics calculated are mean, variance, and autocorrelation.
The cell statistics are calculated on a grid defined by
gridLimit
,gridSpace
andgridInc
using an instance ofStatInterface.generateStats.GenerateStats
.An optional
minSample
(default=100) can be given which sets the minimum number of observations in a given cell to calculate the statistics.
-
cdfCellBearing
()¶ Generate CDFs relating to the bearing of cyclones for each grid cell in the model domain.
-
cdfCellPressure
()¶ Generate CDFs relating to the pressures of cyclones in each grid cell in the model domain.
-
cdfCellSize
()¶ Generate CDFs relating to the size (radius of maximum wind) of cyclones in each grid cell in the model domain.
-
cdfCellSpeed
()¶ Generate CDFs relating to the speed of motion of cyclones for each grid cell in the model domain.
-
kdeGenesisDate
()¶ Generate CDFs relating to the genesis day-of-year of cyclones for each grid cell in teh model domain.
-
kdeOrigin
()¶ Generate 2D PDFs relating to the origin of cyclones.
StatInterface.circularKDE module¶
StatInterface.generateStats module¶
generateStats
– calculation of statistical values¶
-
class
GenerateStats
(parameter, lonLat, gridLimit, gridSpace, gridInc, minSample=100, angular=False, missingValue=9223372036854775807, progressbar=None, prgStartValue=0, prgEndValue=1, calculateLater=False)¶ Bases:
object
Generate the main statistical distributions across the grid domain.
- Parameters
parameter (
numpy.ndarray
or str) – contains the data on which the statistical values will be based. If str, then represents the name of a file that contains the datalonLat (
numpy.ndarray
or str) – Contains the longitude and latitude of each of the observations in thenumpy.ndarray
parametergridLimit (dict) – dictionary containing limits of regional grid Contains keys ‘xMin’, ‘xMax’, ‘yMin’, ‘yMax’
gridLimit – The bounds of the model domain. The
dict
should contain the keysxMin
,xMax
,yMin
andyMax
. The x variable bounds the longitude and the y variable bounds the latitude.gridSpace (dict) – The default grid cell size. The
dict
should contain keys ofx
andy
. The x variable defines the longitudinal grid size, and the y variable defines the latitudinal size.gridInc (dict) – The increment in grid size, for those cells that do not contain sufficient observations for generating distributions. The
dict
should contain the keysx
andy
. The x variable defines the longitudinal grid increment, and the y variable defines the latitudinal increment.minSample (int) – Minimum number of valid observations required to generate distributions. If insufficient observations are found in a grid cell, then it is incrementally expanded until
minSample
is reached.angular (boolean) – If
True
the data represents an angular variable (e.g. bearings). Default isFalse
.
-
calculate
(cellNum, onLand)¶ Calculate the required statistics (mean, variance, autocorrelation and regularized anomaly coefficient) for the given cell.
- Parameters
cellNum (int) – The cell number to process.
onLand (boolean) – If
True
, then the cell is (mostly or entirely) over land. IfFalse
, the cell is over water.
- Returns
mean, standard deviation, autocorrelation, residual correlation and the minimum parameter value.
-
calculateStatistics
()¶ Cycle through the cells and calculate the statistics for the variable.
-
extractParameter
(cellNum, onLand)¶ Extracts the cyclone parameter data for the given cell. If the population of a cell is insufficient for generating a PDF, the bounds of the cell are expanded until the population is sufficient.
Null/missing values are removed.
- Parameters
cellNum (int) – The cell number to process.
- Returns
None. The
parameter
attribute is updated.- Raises
IndexError – if the cell number is not valid (i.e. if it is outside the possible range of cell numbers).
-
load
(filename)¶ Load pre-calculated statistics from a netcdf file.
- Parameters
filename (str) – Path to the netcdf-format file containing the statistics.
-
plotStatistics
(output_file)¶
-
save
(filename, description='')¶ Save parameters to a netcdf file for later access.
- Parameters
filename (str) – Path to the netcdf file to be created.
description (str) – Name of the parameter.
-
acf
(p, nlags=1)¶ Autocorrelation coefficient
- Parameters
p (1-d
numpy.ndarray
) – array of values to calculate autocorrelation coefficient
-
class
parameters
(numCells)¶ Bases:
object
Description: Create an object that holds
numpy.ndarray`s of the statistical properties of each grid cell. There are :class:`numpy.ndarray
for both land and sea cells.mu/lmu : array of mean values of parameter for each grid cell sig/lsig : array of variance values of parameter for each grid cell alpha/lalpha : array of autoregression coefficients of parameter for each grid cell phi/lphi : array of normalisation values for random variations