pgfinder.matching
Matching functions
Module Contents
Functions
|
Calculates ppm tolerance value |
|
Generate list of observed structures from theoretical masses dataframe to reduce search space. |
|
Generate multimers (dimers & trimers) from observed monomers. |
|
Generate modified muropeptides (calculates new mass and add modification tag to structure name). |
|
Match theoretical masses to observed masses within ppm tolerance. |
|
Clean up a DataFrame. |
|
Perform analysis. |
|
Calculate the difference in Parts Per Million between observed and theoretical masses. |
|
Add rows that consolidate ambiguous matches, picking matches with the closest ppm. |
|
Add a final table of muropeptide structures and their relative abundances |
Attributes
- pgfinder.matching.LOGGER
- pgfinder.matching.COLUMNS
- pgfinder.matching.calc_ppm_tolerance(mw: float, ppm_tol: int = 10) float [source]
Calculates ppm tolerance value
- Parameters:
mw (float) – Molecular weight.
ppm_tol (int) – PPM tolerance
- Returns:
?
- Return type:
float
- pgfinder.matching.filtered_theo(ftrs_df: pandas.DataFrame, theo_df: pandas.DataFrame, user_ppm: int) pandas.DataFrame [source]
Generate list of observed structures from theoretical masses dataframe to reduce search space.
- Parameters:
ftrs_df (pd.DataFrame) – Features dataframe.
theo_df (pd.DataFrame) – Theoretical dataframe.
user_ppm (int) – User specified Parts Per Million.
- Returns:
Dataframe filtered on matches with theoretical masses.
- Return type:
pd.DataFrame
- pgfinder.matching.multimer_builder(theo_df: pandas.DataFrame, multimer_type: str, columns: dict = COLUMNS) pandas.DataFrame [source]
Generate multimers (dimers & trimers) from observed monomers.
- Parameters:
theo_df (pd.DataFrame) – Dataframe containing theoretical monomerics structures and their corresponding masses.
multimer_type (str) – Type of multimers to build.
columns (dict) – Dictionary of pgfinder columns, loaded by default from ‘pgfinder/config/columns.yaml’.
- Returns:
Dataframe containing theoretical multimers and their corresponding masses.
- Return type:
pd.DataFrame
- pgfinder.matching.modification_generator(filtered_theo_df: pandas.DataFrame, mod_type: str) pandas.DataFrame [source]
Generate modified muropeptides (calculates new mass and add modification tag to structure name).
- Parameters:
filtered_theo_df (pd.DataFrame) – Pandas DataFrame of theoretical masses that have been filtered.
mod_type (str) – Modification type ???.
- Returns:
Pandas DataFrame of ???.
- Return type:
pd.DataFrame
- pgfinder.matching.matching(ftrs_df: pandas.DataFrame, matching_df: pandas.DataFrame, set_ppm: int) pandas.DataFrame [source]
Match theoretical masses to observed masses within ppm tolerance.
- Parameters:
ftrs_df (pd.DataFrame) – Features DataFrame
matching_df (pd.DataFrame) – Matching DataFrame
set_ppm (int) –
- Returns:
Dataframe of matches.
- Return type:
pd.DataFrame
- pgfinder.matching.clean_up(ftrs_df: pandas.DataFrame, mass_to_clean: decimal.Decimal, time_delta: float) pandas.DataFrame [source]
Clean up a DataFrame.
- Parameters:
ftrs_df (pd.DataFrame) – Features dataframe?
mass_to_clean (Decimal) – Mass to be cleaned.
time_delta (float) – Clean up window.
- Returns:
Tidied Dataframe.
- Return type:
pd.DataFrame
- pgfinder.matching.data_analysis(raw_data_df: pandas.DataFrame, theo_masses_df: pandas.DataFrame, rt_window: float, enabled_mod_list: list, ppm_tolerance: float, consolidation_ppm: float) pandas.DataFrame [source]
Perform analysis.
- Parameters:
raw_data_df (pd.DataFrame) – User data as Pandas DataFrame.
theo_masses_df (pd.DataFrame) – Theoretical masses as Pandas DataFrame.
rt_window (float) – Set time window for in-source decay and salt adduct cleanup.
enabled_mod_list (list) – List of modifications to enable.
ppm_tolerance (float) – The ppm tolerance used when matching the theoretical masses of structures to observed ions.
consolidation_ppm (float) – The minimum absolute ppm difference between two matches before one is picked as “most likely” over the other.
- Returns:
Dataframe of matches.
- Return type:
pd.DataFrame
- pgfinder.matching.calculate_ppm_delta(df: pandas.DataFrame, observed: str = COLUMNS['input']['obs'], theoretical: str = COLUMNS['inferred']['mass'], diff: str = COLUMNS['delta']) pandas.DataFrame [source]
Calculate the difference in Parts Per Million between observed and theoretical masses.
The PPM difference between observed and theoretical mass is calculated as…
\[(1000000 * (obs - theor)) / theor\]The function ensures the column is placed after the theoretical mass column to facilitate its use.
- Parameters:
df (pd.DataFrame) – Pandas DataFrame of results.
observed (str) – Variable that defines the observed PPM.
theoretical (str) – Variable that defines the theoretical PPM.
diff (str) – Variable to be created that holds the difference in PPM.
- Returns:
Pandas DataFrame with difference noted in column diff.
- Return type:
pd.DataFrame
- pgfinder.matching.pick_most_likely_structures(df: pandas.DataFrame, consolidation_ppm: float, columns: dict = COLUMNS) pandas.DataFrame [source]
Add rows that consolidate ambiguous matches, picking matches with the closest ppm.
- Parameters:
df (pd.DataFrame) – DataFrame of structures to be processed.
consolidation_ppm (float) – Minimum Parts Per Million tolerance distinguishing matches.
columns (dict) – Dictionary of columns, this defaults to the global COLUMNS which is read from ‘config/columns.yaml’ and
formats. (simplifies extension to new) –
- Returns:
Dataframe of matches within the specified tolerance. Candidates that are not matched are included in the file for completeness.
- Return type:
pd.DataFrame
- pgfinder.matching.consolidate_results(df: pandas.DataFrame, intensity_column: str = f"Intensity ({COLUMNS['best_match_suffix']})", structure_column: str = f"Inferred structure ({COLUMNS['best_match_suffix']})", rt_column: str = COLUMNS['input']['rt'], theo_column: str = COLUMNS['inferred']['mass'], ppm_column: str = COLUMNS['delta'], abundance_column: str = COLUMNS['consolidation']['Abundance (%)'], oligomer_column: str = 'Oligomerisation', total_column: str = COLUMNS['consolidation']['Total Intensity'], columns: dict = COLUMNS) pandas.DataFrame [source]
Add a final table of muropeptide structures and their relative abundances
- Parameters:
df (pd.DataFrame) – DataFrame of structures to be processed.
intensity_column (str) – Intensity column.
structure_column (str) – Structure column.
rt_column (str) – RT column.
theo_column (str) – Theoretical Mass column.
ppm_column (str) – Delta ppm column.
abundance_column (str) – Abundance column.
oligomer_column (str) – Oligomer column.
total_column (str) – Total column.
- Returns:
The input dataframe with additional columns containing the consolidated results.
- Return type:
pd.DataFrame