Utility Functions¶
This section describes the main utility function parameters (which may be of interest to advanced users).
The utilities
module provides convenience functions for working with profiling datasets.
-
nPYc.utilities.
rsd
(data)¶ Calculate percentage relative standard deviation for each column in data.
\(\mathit{{rsd(x)}} = \frac{\mathit{\sigma_{x}}}{\mathit{\mu_{x}}} \times 100\)
Where RSDs cannot be calculated, (i.e. means of zero),
numpy.finfo(numpy.float64).max
is returned.Parameters: data (numpy.ndarray) – n by m numpy array of data, with features in columns, and samples in rows Returns: m vector of RSDs Return type: numpy.ndarray
-
nPYc.utilities.
buildFileList
(filepath, pattern)¶ Search for data files, by attempting to match to the file path regex pattern. :param filepath: Look for data in all the directories under this location :type searchDirectory: str :param pattern: Recognise experimental data by matching path to this compiled regex :type pattern: re.SRE_Pattern :return: A list of all paths below searchDirectory that matched pattern :rtype: list[str,]
-
nPYc.utilities.
sequentialPrecision
(data)¶ Calculate percentage sequential precision for each column in data. Sequential precision for feature \(x\) is defined as:
\(\mathit{{sp(x)}} = \frac{\sqrt{(\frac{1}{n-1} \sum_{i=1}^{n-1} (x_{i+1} - x_i)^2)/2}}{\mu_{x}} \times 100\)
Parameters: data (numpy.ndarray) – n by m numpy array of measures, with features in columns, and samples in rows Returns: m vector of sequential precision measures Return type: numpy.ndarray
-
nPYc.utilities.
rsdsBySampleType
(dataset, onlyPrecisionReferences=True, useColumn='SampleType')¶ Return percent RSDs calculated for the distinct class values in useColumn, defaults to the SampleType enums in ‘SampleType’.
Parameters: - dataset (Dataset) – Dataset object to generate RSDs for.
- onlyPrecisionReferences (bool) – If
True
only use samples with the ‘AssayRole’ PrecisionReference
Returns: Dict of RSDs for each group
Return type: dict(str:numpy array)