`pgfinder.matching`

Matching functions

Module Contents

Functions

`calc_ppm_tolerance`(→ float)	Calculates ppm tolerance value
`filtered_theo`(→ pandas.DataFrame)	Generate list of observed structures from theoretical masses dataframe to reduce search space.
`multimer_builder`(→ pandas.DataFrame)	Generate multimers (dimers & trimers) from observed monomers.
`modification_generator`(→ pandas.DataFrame)	Generate modified muropeptides (calculates new mass and add modification tag to structure name).
`matching`(→ pandas.DataFrame)	Match theoretical masses to observed masses within ppm tolerance.
`clean_up`(→ pandas.DataFrame)	Clean up a DataFrame.
`data_analysis`(→ pandas.DataFrame)	Perform analysis.
`calculate_ppm_delta`(→ pandas.DataFrame)	Calculate the difference in Parts Per Million between observed and theoretical masses.
`pick_most_likely_structures`(→ pandas.DataFrame)	Add rows that consolidate ambiguous matches, picking matches with the closest ppm.
`consolidate_results`(→ pandas.DataFrame)	Add a final table of muropeptide structures and their relative abundances

Attributes

`LOGGER`
`COLUMNS`

pgfinder.matching.LOGGER

pgfinder.matching.COLUMNS

pgfinder.matching.calc_ppm_tolerance(mw: float, ppm_tol: int = 10) → float[source]

Calculates ppm tolerance value

Parameters:

mw (float) – Molecular weight.
ppm_tol (int) – PPM tolerance

Returns:

?

Return type:

float

pgfinder.matching.filtered_theo(ftrs_df: pandas.DataFrame, theo_df: pandas.DataFrame, user_ppm: int) → pandas.DataFrame[source]

Generate list of observed structures from theoretical masses dataframe to reduce search space.

Parameters:

ftrs_df (pd.DataFrame) – Features dataframe.
theo_df (pd.DataFrame) – Theoretical dataframe.
user_ppm (int) – User specified Parts Per Million.

Returns:

Dataframe filtered on matches with theoretical masses.

Return type:

pd.DataFrame

pgfinder.matching.multimer_builder(theo_df: pandas.DataFrame, multimer_type: str, columns: dict = COLUMNS) → pandas.DataFrame[source]

Generate multimers (dimers & trimers) from observed monomers.

Parameters:

theo_df (pd.DataFrame) – Dataframe containing theoretical monomerics structures and their corresponding masses.
multimer_type (str) – Type of multimers to build.
columns (dict) – Dictionary of pgfinder columns, loaded by default from ‘pgfinder/config/columns.yaml’.

Returns:

Dataframe containing theoretical multimers and their corresponding masses.

Return type:

pd.DataFrame

pgfinder.matching.modification_generator(filtered_theo_df: pandas.DataFrame, mod_type: str) → pandas.DataFrame[source]

Generate modified muropeptides (calculates new mass and add modification tag to structure name).

Parameters:

filtered_theo_df (pd.DataFrame) – Pandas DataFrame of theoretical masses that have been filtered.
mod_type (str) – Modification type ???.

Returns:

Pandas DataFrame of ???.

Return type:

pd.DataFrame

pgfinder.matching.matching(ftrs_df: pandas.DataFrame, matching_df: pandas.DataFrame, set_ppm: int) → pandas.DataFrame[source]

Match theoretical masses to observed masses within ppm tolerance.

Parameters:

ftrs_df (pd.DataFrame) – Features DataFrame
matching_df (pd.DataFrame) – Matching DataFrame
set_ppm (int) –

Returns:

Dataframe of matches.

Return type:

pd.DataFrame

pgfinder.matching.clean_up(ftrs_df: pandas.DataFrame, mass_to_clean: decimal.Decimal, time_delta: float) → pandas.DataFrame[source]

Clean up a DataFrame.

Parameters:

ftrs_df (pd.DataFrame) – Features dataframe?
mass_to_clean (Decimal) – Mass to be cleaned.
time_delta (float) – Clean up window.

Returns:

Tidied Dataframe.

Return type:

pd.DataFrame

pgfinder.matching.data_analysis(raw_data_df: pandas.DataFrame, theo_masses_df: pandas.DataFrame, rt_window: float, enabled_mod_list: list, ppm_tolerance: float, consolidation_ppm: float) → pandas.DataFrame[source]

Perform analysis.

Parameters:

raw_data_df (pd.DataFrame) – User data as Pandas DataFrame.
theo_masses_df (pd.DataFrame) – Theoretical masses as Pandas DataFrame.
rt_window (float) – Set time window for in-source decay and salt adduct cleanup.
enabled_mod_list (list) – List of modifications to enable.
ppm_tolerance (float) – The ppm tolerance used when matching the theoretical masses of structures to observed ions.
consolidation_ppm (float) – The minimum absolute ppm difference between two matches before one is picked as “most likely” over the other.

Returns:

Dataframe of matches.

Return type:

pd.DataFrame

pgfinder.matching.calculate_ppm_delta(df: pandas.DataFrame, observed: str = COLUMNS['input']['obs'], theoretical: str = COLUMNS['inferred']['mass'], diff: str = COLUMNS['delta']) → pandas.DataFrame[source]

Calculate the difference in Parts Per Million between observed and theoretical masses.

The PPM difference between observed and theoretical mass is calculated as…

\[(1000000 * (obs - theor)) / theor\]

The function ensures the column is placed after the theoretical mass column to facilitate its use.

Parameters:

df (pd.DataFrame) – Pandas DataFrame of results.
observed (str) – Variable that defines the observed PPM.
theoretical (str) – Variable that defines the theoretical PPM.
diff (str) – Variable to be created that holds the difference in PPM.

Returns:

Pandas DataFrame with difference noted in column diff.

Return type:

pd.DataFrame

pgfinder.matching.pick_most_likely_structures(df: pandas.DataFrame, consolidation_ppm: float, columns: dict = COLUMNS) → pandas.DataFrame[source]

Add rows that consolidate ambiguous matches, picking matches with the closest ppm.

Parameters:

df (pd.DataFrame) – DataFrame of structures to be processed.
consolidation_ppm (float) – Minimum Parts Per Million tolerance distinguishing matches.
columns (dict) – Dictionary of columns, this defaults to the global COLUMNS which is read from ‘config/columns.yaml’ and
formats. (simplifies extension to new) –

Returns:

Dataframe of matches within the specified tolerance. Candidates that are not matched are included in the file for completeness.

Return type:

pd.DataFrame

pgfinder.matching.consolidate_results(df: pandas.DataFrame, intensity_column: str = f"Intensity ({COLUMNS['best_match_suffix']})", structure_column: str = f"Inferred structure ({COLUMNS['best_match_suffix']})", rt_column: str = COLUMNS['input']['rt'], theo_column: str = COLUMNS['inferred']['mass'], ppm_column: str = COLUMNS['delta'], abundance_column: str = COLUMNS['consolidation']['Abundance (%)'], oligomer_column: str = 'Oligomerisation', total_column: str = COLUMNS['consolidation']['Total Intensity'], columns: dict = COLUMNS) → pandas.DataFrame[source]

Add a final table of muropeptide structures and their relative abundances

Parameters:

df (pd.DataFrame) – DataFrame of structures to be processed.
intensity_column (str) – Intensity column.
structure_column (str) – Structure column.
rt_column (str) – RT column.
theo_column (str) – Theoretical Mass column.
ppm_column (str) – Delta ppm column.
abundance_column (str) – Abundance column.
oligomer_column (str) – Oligomer column.
total_column (str) – Total column.

Returns:

The input dataframe with additional columns containing the consolidated results.

Return type:

pd.DataFrame

pgfinder.matching

Module Contents

Functions

Attributes

`pgfinder.matching`