pgfinder.matching ================= .. py:module:: pgfinder.matching .. autoapi-nested-parse:: Matching functions .. !! processed by numpydoc !! Attributes ---------- .. autoapisummary:: pgfinder.matching.LOGGER pgfinder.matching.COLUMNS Functions --------- .. autoapisummary:: pgfinder.matching.calc_ppm_tolerance pgfinder.matching.filtered_theo pgfinder.matching.multimer_builder pgfinder.matching.modification_generator pgfinder.matching.matching pgfinder.matching.clean_up pgfinder.matching.data_analysis pgfinder.matching.calculate_ppm_delta pgfinder.matching.pick_most_likely_structures pgfinder.matching.consolidate_results Module Contents --------------- .. py:data:: LOGGER .. py:data:: COLUMNS .. py:function:: calc_ppm_tolerance(mw: float, ppm_tol: int = 10) -> float Calculates ppm tolerance value :param mw: Molecular weight. :type mw: float :param ppm_tol: PPM tolerance :type ppm_tol: int :returns: ? :rtype: float .. !! processed by numpydoc !! .. py:function:: filtered_theo(ftrs_df: pandas.DataFrame, theo_df: pandas.DataFrame, user_ppm: int) -> pandas.DataFrame Generate list of observed structures from theoretical masses dataframe to reduce search space. :param ftrs_df: Features dataframe. :type ftrs_df: pd.DataFrame :param theo_df: Theoretical dataframe. :type theo_df: pd.DataFrame :param user_ppm: User specified Parts Per Million. :type user_ppm: int :returns: Dataframe filtered on matches with theoretical masses. :rtype: pd.DataFrame .. !! processed by numpydoc !! .. py:function:: multimer_builder(theo_df: pandas.DataFrame, multimer_type: str, columns: dict = COLUMNS) -> pandas.DataFrame Generate multimers (dimers & trimers) from observed monomers. :param theo_df: Dataframe containing theoretical monomerics structures and their corresponding masses. :type theo_df: pd.DataFrame :param multimer_type: Type of multimers to build. :type multimer_type: str :param columns: Dictionary of pgfinder columns, loaded by default from 'pgfinder/config/columns.yaml'. :type columns: dict :returns: Dataframe containing theoretical multimers and their corresponding masses. :rtype: pd.DataFrame .. !! processed by numpydoc !! .. py:function:: modification_generator(filtered_theo_df: pandas.DataFrame, mod_type: str) -> pandas.DataFrame Generate modified muropeptides (calculates new mass and add modification tag to structure name). :param filtered_theo_df: Pandas DataFrame of theoretical masses that have been filtered. :type filtered_theo_df: pd.DataFrame :param mod_type: Modification type ???. :type mod_type: str :returns: Pandas DataFrame of ???. :rtype: pd.DataFrame .. !! processed by numpydoc !! .. py:function:: matching(ftrs_df: pandas.DataFrame, matching_df: pandas.DataFrame, set_ppm: int) -> pandas.DataFrame Match theoretical masses to observed masses within ppm tolerance. :param ftrs_df: Features DataFrame :type ftrs_df: pd.DataFrame :param matching_df: Matching DataFrame :type matching_df: pd.DataFrame :param set_ppm: :type set_ppm: int :returns: Dataframe of matches. :rtype: pd.DataFrame .. !! processed by numpydoc !! .. py:function:: clean_up(ftrs_df: pandas.DataFrame, mass_to_clean: decimal.Decimal, time_delta: float) -> pandas.DataFrame Clean up a DataFrame. :param ftrs_df: Features dataframe? :type ftrs_df: pd.DataFrame :param mass_to_clean: Mass to be cleaned. :type mass_to_clean: Decimal :param time_delta: Clean up window. :type time_delta: float :returns: Tidied Dataframe. :rtype: pd.DataFrame .. !! processed by numpydoc !! .. py:function:: data_analysis(raw_data_df: pandas.DataFrame, theo_masses_df: pandas.DataFrame, rt_window: float, enabled_mod_list: list, ppm_tolerance: float, consolidation_ppm: float) -> pandas.DataFrame Perform analysis. :param raw_data_df: User data as Pandas DataFrame. :type raw_data_df: pd.DataFrame :param theo_masses_df: Theoretical masses as Pandas DataFrame. :type theo_masses_df: pd.DataFrame :param rt_window: Set time window for in-source decay and salt adduct cleanup. :type rt_window: float :param enabled_mod_list: List of modifications to enable. :type enabled_mod_list: list :param ppm_tolerance: The ppm tolerance used when matching the theoretical masses of structures to observed ions. :type ppm_tolerance: float :param consolidation_ppm: The minimum absolute ppm difference between two matches before one is picked as "most likely" over the other. :type consolidation_ppm: float :returns: Dataframe of matches. :rtype: pd.DataFrame .. !! processed by numpydoc !! .. py:function:: calculate_ppm_delta(df: pandas.DataFrame, observed: str = COLUMNS['input']['obs'], theoretical: str = COLUMNS['inferred']['mass'], diff: str = COLUMNS['delta']) -> pandas.DataFrame Calculate the difference in Parts Per Million between observed and theoretical masses. The PPM difference between observed and theoretical mass is calculated as... .. math:: (1000000 * (obs - theor)) / theor The function ensures the column is placed after the theoretical mass column to facilitate its use. :param df: Pandas DataFrame of results. :type df: pd.DataFrame :param observed: Variable that defines the observed PPM. :type observed: str :param theoretical: Variable that defines the theoretical PPM. :type theoretical: str :param diff: Variable to be created that holds the difference in PPM. :type diff: str :returns: Pandas DataFrame with difference noted in column diff. :rtype: pd.DataFrame .. !! processed by numpydoc !! .. py:function:: pick_most_likely_structures(df: pandas.DataFrame, consolidation_ppm: float, columns: dict = COLUMNS) -> pandas.DataFrame Add rows that consolidate ambiguous matches, picking matches with the closest ppm. :param df: DataFrame of structures to be processed. :type df: pd.DataFrame :param consolidation_ppm: Minimum Parts Per Million tolerance distinguishing matches. :type consolidation_ppm: float :param columns: Dictionary of columns, this defaults to the global COLUMNS which is read from 'config/columns.yaml' and :type columns: dict :param simplifies extension to new formats.: :returns: Dataframe of matches within the specified tolerance. Candidates that are not matched are included in the file for completeness. :rtype: pd.DataFrame .. !! processed by numpydoc !! .. py:function:: consolidate_results(df: pandas.DataFrame, intensity_column: str = f"Intensity ({COLUMNS['best_match_suffix']})", structure_column: str = f"Inferred structure ({COLUMNS['best_match_suffix']})", rt_column: str = COLUMNS['input']['rt'], theo_column: str = COLUMNS['inferred']['mass'], ppm_column: str = COLUMNS['delta'], abundance_column: str = COLUMNS['consolidation']['Abundance (%)'], oligomer_column: str = 'Oligomerisation', total_column: str = COLUMNS['consolidation']['Total Intensity'], columns: dict = COLUMNS) -> pandas.DataFrame Add a final table of muropeptide structures and their relative abundances :param df: DataFrame of structures to be processed. :type df: pd.DataFrame :param intensity_column: Intensity column. :type intensity_column: str :param structure_column: Structure column. :type structure_column: str :param rt_column: RT column. :type rt_column: str :param theo_column: Theoretical Mass column. :type theo_column: str :param ppm_column: Delta ppm column. :type ppm_column: str :param abundance_column: Abundance column. :type abundance_column: str :param oligomer_column: Oligomer column. :type oligomer_column: str :param total_column: Total column. :type total_column: str :returns: The input dataframe with additional columns containing the consolidated results. :rtype: pd.DataFrame .. !! processed by numpydoc !!