# Quality Assessment Reports¶

The nPYc-Toolbox offers a series of reports, pre-set visualisations comprised of text, figures and tables to describe and summarise the characteristics of the dataset, and help the user assess the overall impact of quality control decisions (e.g. whether to exclude samples or features or change filtering criteria).

The main reporting functions include:

• Sample Summary: Presents a summary of the samples acquired, see below for details
• Feature Summary: Summarises the main properties of the dataset and method specific quality control metrics, see below for details
• Multivariate Report: Summarises the main outputs of a PCA model and any potential associations with pertinent analytical metadata, see Multivariate Analysis for full details
• Final Report: Summary report compiling information about the samples acquired, and the overall quality of the dataset
• Batch and Run-Order Correction: Specific reports for optimising and assessing correction in MSDataset, see Batch & Run-Order Correction for full details
• Feature Selection: specific report for assessing the number of features passing quality criteria in MSDataset, see Sample and Feature Masks for full details

By default, reports are generated inline (i.e. in a Jupyter notebook), using generateReport(). However reports can also be saved as html documents with static images by supplying a destination path, for example:

saveDir = '/path to save outputs'
nPYc.reports.generateReport(dataset, 'feature summary', destinationPath=saveDir)


The html versions of the reports use Jinja2 templates, default reports are saved in the Templates directory, and may be customised if required.

By default, reports are generated on the full sample and feature complement of each dataset, however, reports can also be generated on only those samples and features not set to be masked from the dataset (i.e. with sampleMask and featureMask values set to True, see Sample and Feature Masks), by running with withExclusions=True for example:

nPYc.reports.generateReport(dataset, 'feature summary', withExclusions=True)


In this way, samples and features can be iteratively masked/included, and the impact of masking visualised before the masks are finally applied and samples and/or features permanently excluded from the dataset.

Throughout the reports, we reference the various QC sample types included in every dataset to enable the characterisation of data quality, see Sample Metadata for full details.

## Sample Summary Report¶

The sample summary report can be used to check the expected samples against those acquired, in terms of numbers, sample type, and any samples either missing from acquisition or not recorded in the sample metadata CSV file:

nPYc.reports.generateReport(msData, 'sample summary')


The main underlying function parameters are as follows:

class nPYc.reports._generateSampleReport

Summarise samples in the dataset.

Generate sample summary report, lists samples acquired, plus if possible, those missing as based on the expected sample manifest.

Parameters: dataTrue (Dataset) – Dataset to report on withExclusions (bool) – If True, only report on features and samples not masked by the sample and feature masks destinationPath (None or str) – If None, run interactivly, else a str specifying the directory to save report into returnOutput (bool) – If True, returns a dictionary of all tables generated during run Optional, dictionary of all tables generated during run

## Feature Summary Report: LC-MS Datasets¶

The LC-MS feature summary report provides visualisations summarising the quality of the dataset with regards to quality control criteria previously described in Lewis et al [1] and can be run using:

nPYc.reports.generateReport(msData, 'feature summary')


The visualisations include both assessment of potential run-order and batch effects, and metrics by which feature quality can be assessed, in order, these consist of:

• Feature abundance (Figure 1)
• Sum of total ion count, TIC (Figures 2 and 3)
• Correlation to dilution (Figures 4, 5 and 7)
• Residual standard deviation, RSD (Figures 6, 7 and 9)
• Chromatographic peak width, if available (Figure 8)
• Ion map (Figure 10)

For several of these parameters (for example, correlation to dilution, RSD), acceptable default values are pre-defined in the configuration SOP, see Built-in Configuration SOPs for details. If different values are required, these can be set by the user either by modifying the SOP, or during data import, or directly at any point during running the pipeline. For more information, see Datasets and for examples, Installation and Tutorials.

The following sections describe how the quality for each of these is assessed.

Feature abundance

The histogram of feature abundance shows the distribution of mean abundance by sample type for each feature (Figure 1).

While a normal distribution is expected for the SS and SR samples, if your study includes LTR samples (QC samples from a different source to the study) it can be the case that a subset of features are not present in these samples. If this is the case, and it is required to limit the feature set to those detected in your LTR samples, features not found in this set could be excluded based on their intensity (see Sample and Feature Masks for details). If an unexpected distribution is observed this should be investigated, for example, by going back to XCMS feature extraction parameters.

Sum of total ion count, TIC

The TIC plot shows the summed intensity of all feature integrals for each sample (Figure 2) and provides insight into potential run-order and batch effects.

By plotting the TIC for each sample (ordered by acquisition date) any broad trends in overall sample intensity can be observed. With LC-MS it is usual to see a gradual decline in TIC across the run owing to increasing inefficiencies in ion detection (from source and ion optic contamination), alongside large jumps if data is acquired in multiple batches, both of which can be mitigated (at least in part) by run-order and batch correction (see Batch & Run-Order Correction).

With our instrumental set-up (recent generation Waters QToF instruments), we implement an automatic gain control (see Lewis et al [1]). Briefly, throughout each experiment, the voltage applied to the MS detector is automatically adjusted to compensate for trends in instrument performance, which, especially when the increments in applied voltage are large, has a noticeable effect on the total ion count (TIC) of the sample. Although with our current set-up changes in detector voltage are capped and thus this is minimised, this was not always (and may not always be) the case. Therefore, an additional figure of TIC coloured by detector voltage is provided (see Figure 3 in tutorial).

Correlation to dilution

Correlation to dilution is one metric by which feature quality can be determined. By inclusion of a dilution series (Serial Dilution Sample, SRD) the correlation to dilution for each feature can be calculated. A histogram of the resulting values shows the distribution of correlation to dilution (Figure 4) and a TIC plot for the SRD samples can be used to assess the overall behaviour of the dilution series (Figure 5).

A high quality dataset should contain only features that can be shown to be measured accurately with respect to the true intensity, i.e. to scale with dilution. During feature filtering, a threshold in correlation to dilution (default value 0.7) is used to exclude all features which do not respond to dilution (see Sample and Feature Masks for details). Figure 4 shows the distribution in correlation to dilution segmented by mean feature intensity. If the distribution in correlation to dilution values is not highly skewed to high values (especially for high and medium intensity features), the reason for this needs investigating.

The first thing to check in this case is that the overall trend in TIC for the dilution series samples corresponds to the expected dilution as defined in the ‘Basic CSV’ file (see CSV template for metadata import), this is shown in Figure 5. Any outliers (for example, mis-injections) can be excluded, which may have a substantial impact on the resulting correlation values.

If a large number of SRD samples are not scaling with dilution, and the distribution in correlation values is poor, the cause of this should be investigated across all stages, from acquisition, through conversion and peak detection.

Residual standard deviation, RSD

Another key metric by which feature quality can be assessed is that of residual standard deviation (Relative Standard Deviation, RSD). By inclusion of precision reference samples (Study Reference, SR) or Long-Term Reference, LTR) the RSD for each feature can be calculated. A histogram of the resulting values shows the distribution of RSD in the SR samples (Figure 6) and a plot of the RSD for each feature by sample type (Figure 9) allows comparison of the variation observed between QC and study samples.

A high quality dataset should contain only features that can be shown to be measured precisely from multiple acquisitions across the run (in this case this is provided by repeated injections of the pooled SR sample). During feature filtering a threshold in RSD (default value 30) is used to exclude all features which cannot be measured precisely across the run (see Sample and Feature Masks for details). Figure 6 shows the distribution in RSD segmented by mean feature intensity. If the distribution is not skewed to low values (especially for high and medium intensity features), the reason for this needs investigating.

The first thing to check is substantial run-order and batch trends (Figure 2), if these are present, the RSD in the SR samples will be skewed to higher values, and batch and run-order correction should be first applied. Additionally, outlying SR samples can cause inflation to the RSD, if a small number of SR samples demonstrate an unusual TIC (which is not shown by surrounding SS samples) these should be excluded before RSD is calculated.

In addition to the requirement that features are measured precisely, the variance observed in the study samples, should exceed that measured in the SR samples, with the expectation that biological variance should exceed analytical variance. The plot comparing the RSD measured in the different sample classes (study reference sample, study samples etc.) provides insight into variance structures in the dataset (Figure 9).

Finally, to assess the main feature quality metrics together a plot of RSD vs. correlation is provided (see Figure 7 in tutorial).

Chromatographic peak width

If available, a histogram is plotted of chromatographic peak width (if available, Figure 8).

Narrower peaks mean better chromatographic resolution, while broadening in peak width (when compared with previous runs) imply indicate potential aging of the column, which may need replacing.

Ion map

The ion map visualises the location of the detected features in the m/z and retention time space of the assay (Figure 10).

This plot can be used to assess potential feature exclusion ranges. For example, where the retention time is outside the useful range of the assay, or presence of signals resulting from polymer contamination.

## Feature Summary Report: NMR Datasets¶

The NMR feature summary report provides visualisations summarising the quality of the dataset with regards to quality control criteria previously described in Dona et al [2] and can be run using:

nPYc.reports.generateReport(nmrData, 'feature summary')


The visualisations include various metrics by which dataset quality can be assessed, in order, these consist of:

• Chemical shift calibration (Figure 1)
• Line width (Figures 2 and 3)
• Baseline consistency (Figure 4)
• Quality of solvent suppression (Figure 5)

For several of these parameters (for example, line width), acceptable default values are pre-defined in the configuration SOP, see Built-in Configuration SOPs for details. If different values are required, these can be set by the user either by modifying the SOP, or during data import, or directly at any point during running the pipeline. For more information, see Datasets and for examples, Installation and Tutorials.

Any samples failing any of the above criteria are flagged, but in the appropriate plots, and in the table at the end of the report.

The following sections describe how the quality for each of these is assessed.

Chemical shift calibration

Variations in sample temperature between acquisitions can result in minor deviations in the chemical shift scale between spectra. To correct these shifts, the toolbox uses an adaptation of the technique published in Pearce et al [3].

Subsequently, the chemical shift calibration algorithm detects deviation from the expected delta ppm and flags those samples outside of the empirical 95% bound as estimated from the whole dataset (Figure 1).

If spectra are failing calibration, firstly the presence of the target resonance should be checked, and if required, a different peak target can be defined as described above.

Line width

Spectral line-width is calculated by fitting a pseudo-voigt line shape to a pre-specified signal on the native-resolution Fourier-transformed spectrum at import, using the lmfit module ( [4]) to optimise the fit.

In the default configuration of the toolbox, line-width is calculated by fitting the TSP singlet at (ppm=0) in urine spectra, and the lactate quartet at (ppm=4.11) in serum or plasma.

A box plot of the calculated line width values coloured by sample type is plotted in Figure 2.

Any samples with values above the line width threshold (here set to 0.8 as above), are also plotted (Figure 3)

Depending on the number of samples failing the line width checks, either individual samples may be re-run, or, if necessary, the acquisition parameters adjusted by the spectroscopist.

Baseline consistency

Baseline consistency is calculated based on two regions at either end of the spectrum expected to contain only electronic noise. For these regions the 5% and 95% percentile bounds in intensity are calculated using all the points in all the spectra. For each individual spectrum, if more than 95% of the intensity points fall outside of these bounds the sample is flagged for review (Figure 4).

The phasing of spectra flagged for review should first be checked, and adjusted if applicable. If a larger number of samples in the dataset fail the spectrometer acquisition parameters (such as receiver gain settings) and sample preparation (such as dilution) should be revised.

Quality of solvent suppression

The solvent suppression quality control is performed by applying the same method as above to the regions flanking either side of the residual solvent peak (Figure 5).

This test normally flags very dilute samples for which it might be difficult to obtain a high quality spectrum without adjusting the sample preparation. However, for these spectra, re-acquisition with more manual adjustment of the solvent suppression parameters may substantially improve the data.

## Feature Summary Report: NMR Targeted Datasets¶

The feature summary report provides visualisations summarising the quality and distribution of values across samples for each individual feature. This report can be obtained by running:

nPYc.reports.generateReport(TargetedData, 'feature summary')


In order, for an NMR targeted dataset these consist of:

• Summary of quantification parameters (Table 1)
• Residual standard deviation, RSD (Figure 2, Table 2)
• Feature distributions (Figure 3)

Summary of quantification parameters

The initial set of tables in the targeted feature summary report summarise information about each of the quantified features (including quantification parameters and reference ranges if available). The first table gives overall results, and in subsequent tables the features and results are broken down by the quantification type.

Depending on the data generation process, the confidence in quantified values can vary greatly, ranging from semi-quantitative measurements (where area is reported but not concentration) to quantitative values (where absolute concentrations are reported) (Broadhurst et al [5]).

The QuantificationType field describes the rigour of the quantification of each compound, see class nPYc.enumerations.QuantificationType for all available options. In Bruker NMR Targeted methods, compounds are quantified using a quantitative method that does not rely on internal standards, therefore ‘QuantOther’ is the recorded value.

Similarly, as a variety of calibration methods can be employed, the CalibrationMethod defines how the calibration curve and spiked standards interact to establish a quantitative measurement, see class nPYc.enumerations.CalibrationMethod for all available options. In Bruker NMR Targeted methods, a quantitative approach that does not rely on calibration curves and internal standard is utilised, therefore ‘otherCalibration’ is the recorded value.

For a given analytical platform, each compound will have a range at which its concentration can be satisfactorily determined; outside of which range the reported value could substantially differ from the true sample concentration (Synovec et al [6]). The definition of what is “satisfactory” and how this range (sometimes called linear range) is determined is specific to the analytical platform and common guidelines set by the community and regulatory agencies (Lee et al [7]).

Depending on the quantification method employed, there are several different quantification measures that may be reported. The LLOQ (lowest limit of quantification) and ULOQ (upper limit of quantification) are the lowest and highest concentration values respectively between which quantitative results can be obtained with a specific degree of confidence. When reporting quantitative data, current convention impose to report values inferior to the LLOQ as “<LLOQ” and values superior to the ULOQ as “>ULOQ”. Data extrapolated outside of these limits are typically not reported in published results as they do not satisfy a predefined degree of confidence

Alternatively, limits of detection (LOD) are sometimes reported, as is the case for the Bruker NMR Targeted outputs.

Residual standard deviation, RSD

As for the MS profiling datasets, for targeted datasets Relative Standard Deviation can be calculated for each feature, and on each sample type (Figure 2, Table 2).

Figure 2 allows a comparative visualization of the RSD per feature across each Sample Type.

A high quality dataset should contain only features that can be shown to be measured precisely from multiple acquisitions across the run (in this case this is provided by repeated injections of the pooled SR sample). Subsequently, at the feature filtering stage a threshold in RSD (default value 30) is used to exclude all features that cannot be measured precisely across the run.

From both Figure 2 and Table 2 it can be seen that for this dataset there are many features with zero values across all samples (and thus also an RSD of zero), these features can also be removed from the dataset if required.

Feature distributions

Finally, violin plots giving the distribution in intensity for each measured feature and for each sample type are shown in Figure 3.

These can also be used to identify features with a very high proportion of zeros or values outside the limits of quantification.

## Dataset Specific Reporting Syntax and Parameters¶

The main function parameters (which may be of interest to advanced users) are as follows:

class nPYc.reports._generateReportMS

Summarise different aspects of an MS dataset

Generate reports for feature summary, correlation to dilution, batch correction assessment, batch correction summary, feature selection, final report, final report abridged, or final report targeted abridged

• ‘feature summary’ Generates feature summary report, plots figures including those for feature abundance, sample TIC and acquisition structure, correlation to dilution, RSD and an ion map.
• ‘correlation to dilution’ Generates a more detailed report on correlation to dilution, broken down by batch subset with TIC, detector voltage, a summary, and heatmap indicating potential saturation or other issues.
• ‘batch correction assessment’ Generates a report before batch correction showing TIC overall and intensity and batch correction fit for a subset of features, to aid specification of batch start and end points.
• ‘batch correction summary’ Generates a report post batch correction with pertinant figures (TIC, RSD etc.) before and after.
• ‘feature selection’ Generates a summary of the number of features passing feature selection (with current settings as definite in the SOP), and a heatmap showing how this number would be affected by changes to RSD and correlation to dilution thresholds.
• ‘final report’ Generates a summary of the final dataset, lists sample numbers present, a selection of figures summarising dataset quality, and a final list of samples missing from acquisition.
• ‘final report abridged’ Generates an abridged summary of the final dataset, lists sample numbers present, a selection of figures summarising dataset quality, and a final list of samples missing from acquisition.
• ‘final report targeted abridged’ Generates an abridged summary of the final targeted (peakPantheR) dataset, lists sample numbers present, a selection of figures summarising dataset quality, feature distributions, and a final list of samples missing from acquisition.
Parameters: msDataTrue (MSDataset) – MSDataset to report on reportType (str) – Type of report to generate, one of feature summary, correlation to dilution, batch correction, feature selection, final report, final report abridged, or final report targeted abridged withExclusions (bool) – If True, only report on features and samples not masked by the sample and feature masks or bool withArtifactualFiltering (None) – If None use the value from Attributes['artifactualFilter']. If True apply artifactual filtering to the feature selection report and final report destinationPath (None or str) – If None plot interactively, otherwise save report to the path specified msDataCorrected (MSDataset) – Only if batch correction, if msDataCorrected included will generate report post correction pcaModel (PCAmodel) – Only if final report, if PCAmodel object is available PCA scores plots coloured by sample type will be added to report
class nPYc.reports._generateReportNMR

Generate reports on NMRdataset objects, possible options are: feature summary or final report

• ‘feature summary’ Generates feature summary report/ QC summary report, plots figures including those for feature calibration check against glucose or TSP, linewidth box plot and baseline/water peak plots.
• ‘final report’ Generates a summary of the final dataset, lists sample numbers present, a selection of figures summarising dataset quality, and a final list of samples missing from acquisition.
Parameters: nmrData (NMRDataset) – NMRDataset to report on reportType (str) – Type of report to generate, one of feature summary, or final report withExclusions (bool) – If True, only report on features and samples not masked by the sample and feature masks destinationPath (None or str) – If None plot interactively, otherwise save report to the path specified
class nPYc.reports._generateReportTargeted

Summarise different aspects of a Targeted Dataset

Generate reports for feature summary, merge LOQ assessment or final report

• ‘feature summary’ Generates feature summary report, …
• ‘merge loq assessment’ Generates a report before mergeLimitsOfQuantification(), highlighting the impact of updating limits of quantification across batch. List and plot limits of quantification that are altered, number of samples impacted.
• ‘final report’ Generates a summary of the final dataset, lists sample numbers present, a selection of figures summarising dataset quality, and a final list of samples missing from acquisition.
Parameters: tDataIn (TargetedDataset) – TargetedDataset to report on reportType (str) – Type or report to generate, one of feature summary, merge loq assessment or final report withExclusions (bool) – If True, only report on features and samples not masked by sample and feature masks destinationPath (None or str) – If None plot interactively, otherwise save report to the path specified numberPlotPerRowLOQ (int) – Only if merge loq assessment, the number of subplots to place on each row numberPlotPerRowFeature (int) – Only if feature summary or final report, the number of subplots to place on each row percentRange (None or float) – None or Float, percentage range for acceptable accuracy [100 - percentRange, 100 + percentRange] and precision [0, percentRange] pcaModel (PCAmodel) – Only if final report, if PCAmodel object is available PCA scores plots coloured by sample type will be added to report ValueError – If ‘tData’ does not satisfy to BasicTargetedDataset definition ValueError – If ‘reportType’ is not feature summary, merge LOQ assessment or final report TypeError – If ‘withExclusion’ is not a bool TypeError – If ‘destinationPath’ is not None or str TypeError – If ‘numberPlotPerRowLOQ’ is not int TypeError – If ‘numberPlotPerRowFeature’ is not int TypeError – If ‘percentRange’ is not None or float
 [1] (1, 2) Matthew R Lewis, Jake TM Pearce, Konstantina Spagou, Martin Green, Anthony C Dona, Ada HY Yuen, Mark David, David J Berry, Katie Chappell, Verena Horneffer-van der Sluis, Rachel Shaw, Simon Lovestone, Paul Elliott, John Shockcor, John C Lindon, Olivier Cloarec, Zoltan Takats, Elaine Holmes and Jeremy K Nicholson. Development and Application of Ultra-Performance Liquid Chromatography-TOF MS for Precision Large Scale Urinary Metabolic Phenotyping. Analytical Chemistry, 88(18):9004-9013, 2016. URL: http://dx.doi.org/10.1021/acs.analchem.6b01481
 [2] Anthony C Dona, Beatriz Jiménez, Hartmut Schäfer, Eberhard Humpfer, Manfred Spraul, Matthew R Lewis, Jake TM Pearce, Elaine Holmes, John C Lindon and Jeremy K Nicholson. Precision High-Throughput Proton NMR Spectroscopy of Human Urine, Serum, and Plasma for Large-Scale Metabolic Phenotyping. Analytical Chemistry, 86(19):9887-9894, 2014. URL: http://dx.doi.org/10.1021/ac5025039
 [3] Jake TM Pearce, Toby J Athersuch, Timothy MD Ebbels, John C Lindon, Jeremy K Nicholson and Hector C Keun. Robust Algorithms for Automated Chemical Shift Calibration of 1D 1H NMR Spectra of Blood Serum. Analytical Chemistry, 80(18):7158-62, 2008. URL: http://dx.doi.org/10.1021/ac8011494
 [4] Matt Newville, Renee Otten, Andrew Nelson, Antonino Ingargiola, Till Stensitzki, Dan Allan, Austin Fox, Michał, Glenn, Yoav Ram, MerlinSmiles, Li Li, Christoph Deil, Stuermer, Alexandre Beelen, Oliver Frost, gpasquev, Allan L. R. Hansen, Alexander Stark, Tim Spillane, Shane Caldwell, Anthony Polloreno, Nicholas Earl, colgan, Robbie Clarken, Kostis Anagnostopoulos, Jose Borreguero, deep-42-thought, Ben Gamari and Anthony Almarza. lmfit. 2018. URL: https://doi.org/10.5281/zenodo.1249416
 [5] David Broadhurst, Royston Goodacre, Stacey N Reinke, Julia Kuligowski, Ian D Wilson, Matthew R Lewis and Warwick B Dunn. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics, 14(6):72, 2018. URL: https://doi.org/10.1007/s11306-018-1367-3
 [6] Synovec, Robert E and Yeung, Edward S. Improvement of the Limit of Detection in Chromatography by an Integration Method. Analytical Chemistry, 57(12):2162-2167, 1985. URL: https://doi.org/10.1021/ac00289a001
 [7] Jean W Lee, Viswanath Devanarayan, Yu Chen Barrett, Russell Weiner, John Allinson, Scott Fountain, Stephen Keller, Ira Weinryb, Marie Green, Larry Duan, James A Rogers, Robert Millham, Peter J O’Brien, Jeff Sailstad, Masood Khan, Chad Ray and John A Wagner. Fit-for-purpose method development and validation for successful biomarker measurement. Pharmaceutical Research, 23(2):312-28, 2006. URL: http://dx.doi.org/10.1007/s11095-005-9045-3