Data Dictionary
Effective use of pgfinder requires an understanding of the inputs and outputs of the software.
Inputs
pgfinder takes data from mass spectrometry instruments, as well as a “database” of expected
masses and some user specified modifications.
FTRS Files
This is a file corresponding to deconvolution data generated by the Byos®
software with the extension .ftrs.
MaxQuant Files
MaxQuant Files are output by the MaxQuant software. They are
tab separated value (TSV) files with a .txt extension.
Modifications
Any number of modifications can be selected to enrich the search space of the database of masses to which the input data is being compared. Allowed mofifications are:
| Modification | Description |
|---|---|
| Sodium | Search for masses corresponding to sodium adducts |
| Potassium | Search for masses corresponding to potassium adducts |
| Anh | Search for anhydromuropeptides |
| DeAc | Search for deacetylated muropeptides |
| DeAc_Anh | Search for deacetylated anhydromuropeptides |
| Nude | Search for muropeptides with an extra GlcNAc-MurNAc disaccharide |
| Decay | Correct output taking into account in-source decay products |
| Amidation | Search for Amidated muropeptides |
| Amidase | Search for peptides resulting from amidase cleavage (GlcNAc-MurNAc loss) |
| Double_Anh | Search for anhydromuropeptides (2 Anhydro groups) |
| Multimers | Search for multimers resulting from 3-3 and 4-3 crosslinks |
| Multimers | Glyco Search for multimers resulting from transglycosylation (no transpeptidation |
| Multimer Lac | Search for lactyl-peptides multimers |
| O-Ac | Search for O-acetylated muropeptides |
Mass Databases (Lists)
Mass databases are lists of structures and their associated mass. They are in CSV format
with a .csv extension. pgfinder has built-in mass lists for Escherichia coli and
Clostridium difficle, but can take a different mass list as an input.
| Column | Description | Unit |
|---|---|---|
| Structure | Structure code | NA |
| Monoisotopicmass | Monoisotopic mass | atomic mass unit |
Reference Masses
The reference masses file defines the masses of the building blocks used to determine the mass of target structures (muropeptides). A default file is provided as part of the package but user supplied files can be used instead. The file should be ASCII text CSV (not Excel files) with the following columns.
| Column | Description | Units |
|---|---|---|
| Code | Encoding of the component | string |
| Mass | Atomic Mass of the component | Daltons |
| Name | Name of component | string |
Target Structures
The target structures are species specific and define the muropeptides for which the mass is to be calculated. A number of options are available but users can also supply their own file. The file should be ASCII text CSV (not Excel files) with a single column that defines the structure using the Codes defined in the reference masses file. An example is shown below
Structure
gm |0
gm-gm |0
gm(Anh) |1
gm(-Ac) |1
gm-AEJ |1
gm-AEJA |1
gm-AEJG |1
gm-AEJAG |1
gm-AEJKR |1
gm-AEJ (Anh) |1
gm-AEJA (Anh) |1
gm-AEJA (-Ac) |1
Available species specific target structures currently available.
Bacilus subtilis
Enterococcus faecalis
Enterococcus faecium
Escehrichia coli
Staphylococcus aureus
Outputs
pgfinder outputs CSV (.csv) files. The columns in these files depend on the input file format.
Embedded Metadata
The first column contains the following metadata
| Data | Description |
|---|---|
file |
Input data file |
masses_file |
Mass list file |
rt_window |
Retention time window |
modifications |
List of modifications |
ppm |
ppm tolerance |
consolidation_ppm |
ppm tolerance for consolidation |
version |
PGFinder version used in analysis |
PGFinder Output
| Field | Description |
|---|---|
Metadata |
Metadata about input files. |
ID |
Peak identifier. |
RT (min) |
Retention Time in minutes. |
Charge |
Charge states at which mass was observed. Can be used to work back from monoisotopic mass to the recorded raw mass/charge ratios. |
Obs (Da) |
Observed mass in Dalton (unit) |
Theo (Da) |
Theoretical mass in Dalton (unit) |
Delta ppm |
Change in Parts Per Million (difference between Obs (Da) and Theo (Da)). |
Inferred structure |
Inferred peptidoglycan structure. |
Intensity |
Intensity of peak in relative units. |
Inferred structure (best match) |
Most likely inferred structure. |
Intensity (best match) |
Intensity of the most likely inferred structure. |
Total Intensity |
Sum of intensities for consolidated structures. Can be used to compare how much material was injected/measured between different runs. |
Structure |
Structure of most likely inferred structure. |
Abundance (%) |
Amount (as a percentage of total intensity) of the inferred structure. |
Consolidated RT (min) |
Consolidated Retention time in minutes of the most likely inferred structure.. |
Consolidated Theo (Da) |
Consolidated theoretical mass in Dalton (unit) of the most likely inferred structure. |
Consolidated Delta ppm |
Consolidated change in Parts Per Million of the most likely inferred structure. |