.. _load_consolidated_results: load_consolidated_results ========================= .. py:class:: ConsolidatedResultsLoader Class for loading and storing the consolidated results of the NL-to-SQL annotation files and other analysis outputs such as token analysis and query statistics .. py:attribute:: config_dict :type: dict Dictionary containing the configuration parameters for the analysis .. py:method:: get_joined_dataframes(jointype='left') Joins all of the dataframes into a single dataframe. The join condition is a composite of model, database name, naturalness level, and question number :param jointype: Type of join to perform. Defaults to left join. :type jointype: str :return: Joined dataframe. :rtype: pd.DataFrame .. py:method:: load_prompt_tokens(file_directory="./data/tokenizer_analysis/") Load the token analysis data generated by the tokenizer_analysis.ipynb workbook The file contains all question prompts and their tokenizations generated by each model used in the experiments. :param file_directory: Directory where the files are stored. :type file_directory: str :return: Pandas Dataframe containing all of the prompt token files :rtype: pd.DataFrame .. py:method:: load_sentence_level_similarities(file_directory="./data/tokenizer_analysis/") Load the sentence level embedding similarity comparisons generated by the tokenizer_analysis.ipynb workbook Scores are based on cosine similarity (distance) between the semantic embeddings (SentenceTransformers) generated for a NL question and a corresponding gold query for each naturalness level. :param file_directory: Directory where the files are stored. :type file_directory: str :return: Pandas Dataframe containing all of the question-query similarity scores for each naturalness level :rtype: pd.DataFrame .. py:method:: load_world_level_similarities(file_directory="./data/tokenizer_analysis/") Load the word - identifier level embedding similarity comparisons generated by the tokenizer_analysis.ipynb workbook. scores are based on the highest cosine similarity distance between a schema identifier and a word in the natural language question. :param file_directory: Directory where the files are stored. :type file_directory: str :return: Pandas Dataframe containing all of the identifier-word similarity scores for each naturalness level :rtype: pd.DataFrame .. py:method:: load_identifier_token_analysis_files(file_directory='./data/tokenizer_analysis/') Load the token analysis data generated by the tokenizer_analysis.ipynb workbook The file contains all database identifiers and their tokenizations generated by each model used in the experiments. :param file_directory: Directory where the files are stored. :type file_directory: str :return: Pandas Dataframe containing all of the token files :rtype: pd.DataFrame .. py:method:: load_question_token_analysis_files(file_directory='./data/tokenizer_analysis/') Load the token analysis data generated by the tokenizer_analysis.ipynb workbook :param file_directory: Directory where the files are stored. :type file_directory: str :return: Pandas Dataframe containing all of the token files :rtype: pd.DataFrame .. py:method:: load_query_token_character_ratio_file(file_directory='./data/tokenizer_analysis') Load query-level mean token:char ratios generated by the tokenizer_analysis.ipynb workbook :param file_directory: Directory where the files are stored. :type file_directory: str :return: Pandas Dataframe containing all of the ratio means at the question-query pair level :rtype: pd.DataFrame .. py:method:: load_annotation_files(annotation_directory=None, database=None, remove_error_columns=True) Load all of the human-validated NL-to-SQL annotation files into a single dataframe :param annotation_directory: Directory where the annotation files are stored :type annotation_directory: str or None :param database: Name of the database to filter by. If None, all databases will be loaded :type database: str or None :param remove_error_columns: Option to exclude error classification data from annotations. :type remove_error_columns: bool :return: Pandas Dataframe containing all of the annotation files :rtype: pd.DataFrame .. py:method:: load_identifier_crosswalks(file_directory="./db/schema-xwalks/consolidated_and_validated", SBODemo_full=False) Load a single dataframe containing identifier naturalness crosswalk data from all databases :param file_directory: Directory where the files are stored. :type file_directory: str :param SBODemo_full: Use crosswalk containing ALL SBODemo identifiers (as opposed to the benchmark subset) :type SBODemo_full: bool :return: A single dataframe containing identifier naturalness crosswalk data from all databases :rtype: pd.DataFrame .. py:function:: export_gold_data() Exports gold data to an Excel file. .. toctree:: :maxdepth: 2 :caption: Contents: