Utility Functions

This section describes the main utility function parameters (which may be of interest to advanced users).

The utilities module provides convenience functions for working with profiling datasets.

nPYc.utilities.rsd(data)

Calculate percentage relative standard deviation for each column in data.

\(\mathit{{rsd(x)}} = \frac{\mathit{\sigma_{x}}}{\mathit{\mu_{x}}} \times 100\)

Where RSDs cannot be calculated, (i.e. means of zero), numpy.finfo(numpy.float64).max is returned.

Parameters:data (numpy.ndarray) – n by m numpy array of data, with features in columns, and samples in rows
Returns:m vector of RSDs
Return type:numpy.ndarray
nPYc.utilities.buildFileList(filepath, pattern)

Search for data files, by attempting to match to the file path regex pattern. :param filepath: Look for data in all the directories under this location :type searchDirectory: str :param pattern: Recognise experimental data by matching path to this compiled regex :type pattern: re.SRE_Pattern :return: A list of all paths below searchDirectory that matched pattern :rtype: list[str,]

nPYc.utilities.sequentialPrecision(data)

Calculate percentage sequential precision for each column in data. Sequential precision for feature \(x\) is defined as:

\(\mathit{{sp(x)}} = \frac{\sqrt{(\frac{1}{n-1} \sum_{i=1}^{n-1} (x_{i+1} - x_i)^2)/2}}{\mu_{x}} \times 100\)

Parameters:data (numpy.ndarray) – n by m numpy array of measures, with features in columns, and samples in rows
Returns:m vector of sequential precision measures
Return type:numpy.ndarray
nPYc.utilities.rsdsBySampleType(dataset, onlyPrecisionReferences=True, useColumn='SampleType')

Return percent RSDs calculated for the distinct class values in useColumn, defaults to the SampleType enums in ‘SampleType’.

Parameters:
  • dataset (Dataset) – Dataset object to generate RSDs for.
  • onlyPrecisionReferences (bool) – If True only use samples with the ‘AssayRole’ PrecisionReference
Returns:

Dict of RSDs for each group

Return type:

dict(str:numpy array)