# Data Dictionary Effective use of `pgfinder` requires an understanding of the inputs and outputs of the software. ## Inputs `pgfinder` takes data from mass spectrometry instruments, as well as a "database" of expected masses and some user specified modifications. ### FTRS Files This is a file corresponding to deconvolution data generated by the [Byos®](https://proteinmetrics.com/byos/) software with the extension `.ftrs`. ### MaxQuant Files MaxQuant Files are output by the [MaxQuant](https://www.maxquant.org/) software. They are tab separated value (`TSV`) files with a `.txt` extension. ### Modifications Any number of modifications can be selected to enrich the search space of the database of masses to which the input data is being compared. Allowed mofifications are: | Modification | Description | |:-------------|:-------------------------------------------------------------------------------------| | Sodium | Search for masses corresponding to sodium adducts | | Potassium | Search for masses corresponding to potassium adducts | | Anh | Search for anhydromuropeptides | | DeAc | Search for deacetylated muropeptides | | DeAc_Anh | Search for deacetylated anhydromuropeptides | | Nude | Search for muropeptides with an extra GlcNAc-MurNAc disaccharide | | Decay | Correct output taking into account in-source decay products | | Amidation | Search for Amidated muropeptides | | Amidase | Search for peptides resulting from amidase cleavage (GlcNAc-MurNAc loss) | | Double_Anh | Search for anhydromuropeptides (2 Anhydro groups) | | Multimers | Search for multimers resulting from 3-3 and 4-3 crosslinks | | Multimers | Glyco Search for multimers resulting from transglycosylation (no transpeptidation | | Multimer Lac | Search for lactyl-peptides multimers | | O-Ac | Search for O-acetylated muropeptides | ### Mass Databases (Lists) Mass databases are lists of structures and their associated mass. They are in `CSV` format with a `.csv` extension. `pgfinder` has built-in mass lists for *Escherichia coli* and *Clostridium difficle*, but can take a different mass list as an input. | Column | Description | Unit | |------------------|-------------------|------------------| | Structure | Structure code | NA | | Monoisotopicmass | Monoisotopic mass | atomic mass unit | ### Reference Masses The reference masses file defines the masses of the building blocks used to determine the mass of target structures (muropeptides). A default file is provided as part of the package but user supplied files can be used instead. The file should be ASCII text CSV (not Excel files) with the following columns. | Column | Description | Units | |:-------|:-----------------------------|:---------| | Code | Encoding of the component | `string` | | Mass | Atomic Mass of the component | Daltons | | Name | Name of component | `string` | ### Target Structures The target structures are species specific and define the muropeptides for which the mass is to be calculated. A number of options are available but users can also supply their own file. The file should be ASCII text CSV (not Excel files) with a single column that defines the structure using the Codes defined in the [reference masses](#reference_masses) file. An example is shown below ``` Structure gm |0 gm-gm |0 gm(Anh) |1 gm(-Ac) |1 gm-AEJ |1 gm-AEJA |1 gm-AEJG |1 gm-AEJAG |1 gm-AEJKR |1 gm-AEJ (Anh) |1 gm-AEJA (Anh) |1 gm-AEJA (-Ac) |1 ``` Available species specific target structures currently available. + _Bacilus subtilis_ + _Enterococcus faecalis_ + _Enterococcus faecium_ + _Escehrichia coli_ + _Staphylococcus aureus_ ## Outputs `pgfinder` outputs `CSV` (`.csv`) files. The columns in these files depend on the input file format. ### Embedded Metadata The first column contains the following metadata | Data | Description | |:--------------------|:----------------------------------------| | `file` | Input data file | | `masses_file` | Mass list file | | `rt_window` | Retention time window | | `modifications` | List of [modifications](#modifications) | | `ppm` | ppm tolerance | | `consolidation_ppm` | ppm tolerance for consolidation | | `version` | PGFinder version used in analysis | ### PGFinder Output | Field | Description | |:----------------------------------|:----------------------------------------------------------------------------------------------| | `Metadata` | Metadata about input files. | | `ID` | Peak identifier. | | `RT (min)` | Retention Time in minutes. | | `Charge` | Charge states at which mass was observed. Can be used to work back from monoisotopic mass to the recorded raw mass/charge ratios. | | `Obs (Da)` | Observed mass in [Dalton (unit)](https://en.wikipedia.org/wiki/Dalton_(unit)) | | `Theo (Da)` | Theoretical mass in [Dalton (unit)](https://en.wikipedia.org/wiki/Dalton_(unit)) | | `Delta ppm` | Change in Parts Per Million (difference between `Obs (Da)` and `Theo (Da)`). | | `Inferred structure` | Inferred peptidoglycan structure. | | `Intensity` | Intensity of peak in relative units. | | `Inferred structure (best match)` | Most likely inferred structure. | | `Intensity (best match)` | Intensity of the most likely inferred structure. | | `Total Intensity` | Sum of intensities for consolidated structures. Can be used to compare how much material was injected/measured between different runs. | | `Structure` | Structure of most likely inferred structure. | | `Abundance (%)` | Amount (as a percentage of total intensity) of the inferred structure. | | `Consolidated RT (min)` | Consolidated Retention time in minutes of the most likely inferred structure.. | | `Consolidated Theo (Da)` | Consolidated theoretical mass in [Dalton (unit)](https://en.wikipedia.org/wiki/Dalton_(unit)) of the most likely inferred structure. | | `Consolidated Delta ppm` | Consolidated change in Parts Per Million of the most likely inferred structure. |