pgfinder.matching

Matching functions

Module Contents

Functions

calc_ppm_tolerance(→ float)

Calculates ppm tolerance value

filtered_theo(→ pandas.DataFrame)

Generate list of observed structures from theoretical masses dataframe to reduce search space.

multimer_builder(→ pandas.DataFrame)

Generate multimers (dimers & trimers) from observed monomers.

modification_generator(→ pandas.DataFrame)

Generate modified muropeptides (calculates new mass and add modification tag to structure name).

matching(→ pandas.DataFrame)

Match theoretical masses to observed masses within ppm tolerance.

clean_up(→ pandas.DataFrame)

Clean up a DataFrame.

data_analysis(→ pandas.DataFrame)

Perform analysis.

calculate_ppm_delta(→ pandas.DataFrame)

Calculate the difference in Parts Per Million between observed and theoretical masses.

pick_most_likely_structures(→ pandas.DataFrame)

Add rows that consolidate ambiguous matches, picking matches with the closest ppm.

consolidate_results(→ pandas.DataFrame)

Add a final table of muropeptide structures and their relative abundances

Attributes

LOGGER

COLUMNS

pgfinder.matching.LOGGER
pgfinder.matching.COLUMNS
pgfinder.matching.calc_ppm_tolerance(mw: float, ppm_tol: int = 10) float[source]

Calculates ppm tolerance value

Parameters:
  • mw (float) – Molecular weight.

  • ppm_tol (int) – PPM tolerance

Returns:

?

Return type:

float

pgfinder.matching.filtered_theo(ftrs_df: pandas.DataFrame, theo_df: pandas.DataFrame, user_ppm: int) pandas.DataFrame[source]

Generate list of observed structures from theoretical masses dataframe to reduce search space.

Parameters:
  • ftrs_df (pd.DataFrame) – Features dataframe.

  • theo_df (pd.DataFrame) – Theoretical dataframe.

  • user_ppm (int) – User specified Parts Per Million.

Returns:

Dataframe filtered on matches with theoretical masses.

Return type:

pd.DataFrame

pgfinder.matching.multimer_builder(theo_df: pandas.DataFrame, multimer_type: str, columns: dict = COLUMNS) pandas.DataFrame[source]

Generate multimers (dimers & trimers) from observed monomers.

Parameters:
  • theo_df (pd.DataFrame) – Dataframe containing theoretical monomerics structures and their corresponding masses.

  • multimer_type (str) – Type of multimers to build.

  • columns (dict) – Dictionary of pgfinder columns, loaded by default from ‘pgfinder/config/columns.yaml’.

Returns:

Dataframe containing theoretical multimers and their corresponding masses.

Return type:

pd.DataFrame

pgfinder.matching.modification_generator(filtered_theo_df: pandas.DataFrame, mod_type: str) pandas.DataFrame[source]

Generate modified muropeptides (calculates new mass and add modification tag to structure name).

Parameters:
  • filtered_theo_df (pd.DataFrame) – Pandas DataFrame of theoretical masses that have been filtered.

  • mod_type (str) – Modification type ???.

Returns:

Pandas DataFrame of ???.

Return type:

pd.DataFrame

pgfinder.matching.matching(ftrs_df: pandas.DataFrame, matching_df: pandas.DataFrame, set_ppm: int) pandas.DataFrame[source]

Match theoretical masses to observed masses within ppm tolerance.

Parameters:
  • ftrs_df (pd.DataFrame) – Features DataFrame

  • matching_df (pd.DataFrame) – Matching DataFrame

  • set_ppm (int) –

Returns:

Dataframe of matches.

Return type:

pd.DataFrame

pgfinder.matching.clean_up(ftrs_df: pandas.DataFrame, mass_to_clean: decimal.Decimal, time_delta: float) pandas.DataFrame[source]

Clean up a DataFrame.

Parameters:
  • ftrs_df (pd.DataFrame) – Features dataframe?

  • mass_to_clean (Decimal) – Mass to be cleaned.

  • time_delta (float) – Clean up window.

Returns:

Tidied Dataframe.

Return type:

pd.DataFrame

pgfinder.matching.data_analysis(raw_data_df: pandas.DataFrame, theo_masses_df: pandas.DataFrame, rt_window: float, enabled_mod_list: list, ppm_tolerance: float, consolidation_ppm: float) pandas.DataFrame[source]

Perform analysis.

Parameters:
  • raw_data_df (pd.DataFrame) – User data as Pandas DataFrame.

  • theo_masses_df (pd.DataFrame) – Theoretical masses as Pandas DataFrame.

  • rt_window (float) – Set time window for in-source decay and salt adduct cleanup.

  • enabled_mod_list (list) – List of modifications to enable.

  • ppm_tolerance (float) – The ppm tolerance used when matching the theoretical masses of structures to observed ions.

  • consolidation_ppm (float) – The minimum absolute ppm difference between two matches before one is picked as “most likely” over the other.

Returns:

Dataframe of matches.

Return type:

pd.DataFrame

pgfinder.matching.calculate_ppm_delta(df: pandas.DataFrame, observed: str = COLUMNS['input']['obs'], theoretical: str = COLUMNS['inferred']['mass'], diff: str = COLUMNS['delta']) pandas.DataFrame[source]

Calculate the difference in Parts Per Million between observed and theoretical masses.

The PPM difference between observed and theoretical mass is calculated as…

\[(1000000 * (obs - theor)) / theor\]

The function ensures the column is placed after the theoretical mass column to facilitate its use.

Parameters:
  • df (pd.DataFrame) – Pandas DataFrame of results.

  • observed (str) – Variable that defines the observed PPM.

  • theoretical (str) – Variable that defines the theoretical PPM.

  • diff (str) – Variable to be created that holds the difference in PPM.

Returns:

Pandas DataFrame with difference noted in column diff.

Return type:

pd.DataFrame

pgfinder.matching.pick_most_likely_structures(df: pandas.DataFrame, consolidation_ppm: float, columns: dict = COLUMNS) pandas.DataFrame[source]

Add rows that consolidate ambiguous matches, picking matches with the closest ppm.

Parameters:
  • df (pd.DataFrame) – DataFrame of structures to be processed.

  • consolidation_ppm (float) – Minimum Parts Per Million tolerance distinguishing matches.

  • columns (dict) – Dictionary of columns, this defaults to the global COLUMNS which is read from ‘config/columns.yaml’ and

  • formats. (simplifies extension to new) –

Returns:

Dataframe of matches within the specified tolerance. Candidates that are not matched are included in the file for completeness.

Return type:

pd.DataFrame

pgfinder.matching.consolidate_results(df: pandas.DataFrame, intensity_column: str = f"Intensity ({COLUMNS['best_match_suffix']})", structure_column: str = f"Inferred structure ({COLUMNS['best_match_suffix']})", rt_column: str = COLUMNS['input']['rt'], theo_column: str = COLUMNS['inferred']['mass'], ppm_column: str = COLUMNS['delta'], abundance_column: str = COLUMNS['consolidation']['Abundance (%)'], oligomer_column: str = 'Oligomerisation', total_column: str = COLUMNS['consolidation']['Total Intensity'], columns: dict = COLUMNS) pandas.DataFrame[source]

Add a final table of muropeptide structures and their relative abundances

Parameters:
  • df (pd.DataFrame) – DataFrame of structures to be processed.

  • intensity_column (str) – Intensity column.

  • structure_column (str) – Structure column.

  • rt_column (str) – RT column.

  • theo_column (str) – Theoretical Mass column.

  • ppm_column (str) – Delta ppm column.

  • abundance_column (str) – Abundance column.

  • oligomer_column (str) – Oligomer column.

  • total_column (str) – Total column.

Returns:

The input dataframe with additional columns containing the consolidated results.

Return type:

pd.DataFrame