pgfinder.matching
=================

.. py:module:: pgfinder.matching

.. autoapi-nested-parse::

   Matching functions

   ..
       !! processed by numpydoc !!


Attributes
----------

.. autoapisummary::

   pgfinder.matching.LOGGER
   pgfinder.matching.COLUMNS


Functions
---------

.. autoapisummary::

   pgfinder.matching.calc_ppm_tolerance
   pgfinder.matching.filtered_theo
   pgfinder.matching.multimer_builder
   pgfinder.matching.modification_generator
   pgfinder.matching.matching
   pgfinder.matching.clean_up
   pgfinder.matching.data_analysis
   pgfinder.matching.calculate_ppm_delta
   pgfinder.matching.pick_most_likely_structures
   pgfinder.matching.consolidate_results


Module Contents
---------------

.. py:data:: LOGGER

.. py:data:: COLUMNS

.. py:function:: calc_ppm_tolerance(mw: float, ppm_tol: int = 10) -> float

   
   Calculates ppm tolerance value

   :param mw: Molecular weight.
   :type mw: float
   :param ppm_tol: PPM tolerance
   :type ppm_tol: int

   :returns: ?
   :rtype: float


   ..
       !! processed by numpydoc !!

.. py:function:: filtered_theo(ftrs_df: pandas.DataFrame, theo_df: pandas.DataFrame, user_ppm: int) -> pandas.DataFrame

   
   Generate list of observed structures from theoretical masses dataframe to reduce search space.

   :param ftrs_df: Features dataframe.
   :type ftrs_df: pd.DataFrame
   :param theo_df: Theoretical dataframe.
   :type theo_df: pd.DataFrame
   :param user_ppm: User specified Parts Per Million.
   :type user_ppm: int

   :returns: Dataframe filtered on matches with theoretical masses.
   :rtype: pd.DataFrame


   ..
       !! processed by numpydoc !!

.. py:function:: multimer_builder(theo_df: pandas.DataFrame, multimer_type: str, columns: dict = COLUMNS) -> pandas.DataFrame

   
   Generate multimers (dimers & trimers) from observed monomers.

   :param theo_df: Dataframe containing theoretical monomerics structures and their corresponding masses.
   :type theo_df: pd.DataFrame
   :param multimer_type: Type of multimers to build.
   :type multimer_type: str
   :param columns: Dictionary of pgfinder columns, loaded by default from 'pgfinder/config/columns.yaml'.
   :type columns: dict

   :returns: Dataframe containing theoretical multimers and their corresponding masses.
   :rtype: pd.DataFrame


   ..
       !! processed by numpydoc !!

.. py:function:: modification_generator(filtered_theo_df: pandas.DataFrame, mod_type: str) -> pandas.DataFrame

   
   Generate modified muropeptides (calculates new mass and add modification tag to structure name).

   :param filtered_theo_df: Pandas DataFrame of theoretical masses that have been filtered.
   :type filtered_theo_df: pd.DataFrame
   :param mod_type: Modification type ???.
   :type mod_type: str

   :returns: Pandas DataFrame of ???.
   :rtype: pd.DataFrame


   ..
       !! processed by numpydoc !!

.. py:function:: matching(ftrs_df: pandas.DataFrame, matching_df: pandas.DataFrame, set_ppm: int) -> pandas.DataFrame

   
   Match theoretical masses to observed masses within ppm tolerance.

   :param ftrs_df: Features DataFrame
   :type ftrs_df: pd.DataFrame
   :param matching_df: Matching DataFrame
   :type matching_df: pd.DataFrame
   :param set_ppm:
   :type set_ppm: int

   :returns: Dataframe of matches.
   :rtype: pd.DataFrame


   ..
       !! processed by numpydoc !!

.. py:function:: clean_up(ftrs_df: pandas.DataFrame, mass_to_clean: decimal.Decimal, time_delta: float) -> pandas.DataFrame

   
   Clean up a DataFrame.

   :param ftrs_df: Features dataframe?
   :type ftrs_df: pd.DataFrame
   :param mass_to_clean: Mass to be cleaned.
   :type mass_to_clean: Decimal
   :param time_delta: Clean up window.
   :type time_delta: float

   :returns: Tidied Dataframe.
   :rtype: pd.DataFrame


   ..
       !! processed by numpydoc !!

.. py:function:: data_analysis(raw_data_df: pandas.DataFrame, theo_masses_df: pandas.DataFrame, rt_window: float, enabled_mod_list: list, ppm_tolerance: float, consolidation_ppm: float) -> pandas.DataFrame

   
   Perform analysis.

   :param raw_data_df: User data as Pandas DataFrame.
   :type raw_data_df: pd.DataFrame
   :param theo_masses_df: Theoretical masses as Pandas DataFrame.
   :type theo_masses_df: pd.DataFrame
   :param rt_window: Set time window for in-source decay and salt adduct cleanup.
   :type rt_window: float
   :param enabled_mod_list: List of modifications to enable.
   :type enabled_mod_list: list
   :param ppm_tolerance: The ppm tolerance used when matching the theoretical masses of structures to observed ions.
   :type ppm_tolerance: float
   :param consolidation_ppm: The minimum absolute ppm difference between two matches before one is picked as "most likely" over the other.
   :type consolidation_ppm: float

   :returns: Dataframe of matches.
   :rtype: pd.DataFrame


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_ppm_delta(df: pandas.DataFrame, observed: str = COLUMNS['input']['obs'], theoretical: str = COLUMNS['inferred']['mass'], diff: str = COLUMNS['delta']) -> pandas.DataFrame

   
   Calculate the difference in Parts Per Million between observed and theoretical masses.

   The PPM difference between observed and theoretical mass is calculated as...

   .. math:: (1000000 * (obs - theor)) / theor

   The function ensures the column is placed after the theoretical mass column to facilitate its use.

   :param df: Pandas DataFrame of results.
   :type df: pd.DataFrame
   :param observed: Variable that defines the observed PPM.
   :type observed: str
   :param theoretical: Variable that defines the theoretical PPM.
   :type theoretical: str
   :param diff: Variable to be created that holds the difference in PPM.
   :type diff: str

   :returns: Pandas DataFrame with difference noted in column diff.
   :rtype: pd.DataFrame


   ..
       !! processed by numpydoc !!

.. py:function:: pick_most_likely_structures(df: pandas.DataFrame, consolidation_ppm: float, columns: dict = COLUMNS) -> pandas.DataFrame

   
   Add rows that consolidate ambiguous matches, picking matches with the closest ppm.

   :param df: DataFrame of structures to be processed.
   :type df: pd.DataFrame
   :param consolidation_ppm: Minimum Parts Per Million tolerance distinguishing matches.
   :type consolidation_ppm: float
   :param columns: Dictionary of columns, this defaults to the global COLUMNS which is read from 'config/columns.yaml' and
   :type columns: dict
   :param simplifies extension to new formats.:

   :returns: Dataframe of matches within the specified tolerance. Candidates that are not matched are included
             in the file for completeness.
   :rtype: pd.DataFrame


   ..
       !! processed by numpydoc !!

.. py:function:: consolidate_results(df: pandas.DataFrame, intensity_column: str = f"Intensity ({COLUMNS['best_match_suffix']})", structure_column: str = f"Inferred structure ({COLUMNS['best_match_suffix']})", rt_column: str = COLUMNS['input']['rt'], theo_column: str = COLUMNS['inferred']['mass'], ppm_column: str = COLUMNS['delta'], abundance_column: str = COLUMNS['consolidation']['Abundance (%)'], oligomer_column: str = 'Oligomerisation', total_column: str = COLUMNS['consolidation']['Total Intensity'], columns: dict = COLUMNS) -> pandas.DataFrame

   
   Add a final table of muropeptide structures and their relative abundances

   :param df: DataFrame of structures to be processed.
   :type df: pd.DataFrame
   :param intensity_column: Intensity column.
   :type intensity_column: str
   :param structure_column: Structure column.
   :type structure_column: str
   :param rt_column: RT column.
   :type rt_column: str
   :param theo_column: Theoretical Mass column.
   :type theo_column: str
   :param ppm_column: Delta ppm column.
   :type ppm_column: str
   :param abundance_column: Abundance column.
   :type abundance_column: str
   :param oligomer_column: Oligomer column.
   :type oligomer_column: str
   :param total_column: Total column.
   :type total_column: str

   :returns: The input dataframe with additional columns containing the consolidated results.
   :rtype: pd.DataFrame


   ..
       !! processed by numpydoc !!