nl_to_sql_inference_and_prompt_generation

do_single_question(original_prompt, use_database, question, xwalk_directory=None, column_naturalness=0, table_naturalness=0, log=True, filename_suffix='GPT-FT', filename_prefix='', task='query', service='openai', model_name='GPT-3.5', db_type='sql server', db_list_file='.local/dbinfo.json')

Executes a single natural language question against a specified database using a specified AI service and generates a predicted SQL query.

Parameters:
  • original_prompt (str) – The initial prompt to be used.

  • use_database (str) – The database to query.

  • question (str) – The question to be appended to the prompt.

  • xwalk_directory (str, optional) – Directory for crosswalk files. Defaults to None.

  • column_naturalness (int, optional) – Level of naturalness for columns. Defaults to 0.

  • table_naturalness (int, optional) – Level of naturalness for tables. Defaults to 0.

  • log (bool, optional) – Whether to log the attempt. Defaults to True.

  • filename_suffix (str, optional) – Suffix for filenames. Defaults to ‘GPT-FT’.

  • filename_prefix (str, optional) – Prefix for filenames. Defaults to ‘’.

  • task (str, optional) – The task to perform (‘query’ or ‘tables’). Defaults to ‘query’.

  • service (str, optional) – The AI service to use (‘openai’, ‘google-vertex’, ‘google-palm’, ‘code-llama-aws’, ‘togetherai’). Defaults to ‘openai’.

  • model_name (str, optional) – The model name to use. Defaults to ‘GPT-3.5’.

  • db_type (str, optional) – The type of database (‘sql server’ or ‘sqlite’). Defaults to “sql server”.

  • db_list_file (str, optional) – Path to the database list file. Defaults to “.local/dbinfo.json”.

Returns:

A dictionary containing the prompt, SQL response, result dataframe, naturalness, and denaturalized response.

Return type:

dict

naturalize_prompt(schema_prompt, db_name, xwalk_directory='./db/schema-xwalks/consolidated_and_validated/', column_naturalness=0, table_naturalness=0, filename_suffix='GPT-FT', filename_prefix='')

Naturalize the prompt by replacing table and column names with natural language names.

Parameters:
  • schema_prompt (str) – The prompt to naturalize.

  • db_name (str) – The name of the database on which the resulting query will be run.

  • xwalk_directory (str, optional) – The directory in which the crosswalk files are stored. Defaults to ‘./db/schema-xwalks/consolidated_and_validated/’.

  • column_naturalness (int, optional) – The level of naturalness to use for column names. Defaults to 0.

  • table_naturalness (int, optional) – The level of naturalness to use for table names. Defaults to 0.

  • filename_suffix (str, optional) – The suffix to use for the crosswalk files. Defaults to ‘GPT-FT’.

  • filename_prefix (str, optional) – The prefix to use for the crosswalk files. Defaults to ‘’.

Returns:

A tuple containing a dictionary with the naturalness levels and the naturalized schema prompt.

Return type:

tuple[dict, str]

denaturalize_query(query, naturalness, xwalk_directory='./db/schema-xwalks/consolidated_and_validated/', db_name='PacificIslandLandbirds', filename_suffix='GPT-FT', filename_prefix='', syntax='tsql', target_naturalness='native')

Denaturalize a query by replacing natural language table and column names with their native identifiers.

Parameters:
  • query (str) – The query to denaturalize.

  • naturalness (dict) – A dictionary with keys ‘table’ and ‘column’ and values corresponding to the naturalness level used for each.

  • xwalk_directory (str, optional) – The directory in which the crosswalk files are stored.

  • db_name (str, optional) – The name of the database on which the resulting query will be run. Defaults to ‘PacificIslandLandbirds’.

  • filename_suffix (str, optional) – The suffix to use for the crosswalk files. Defaults to ‘GPT-FT’.

  • filename_prefix (str, optional) – The prefix to use for the crosswalk files. Defaults to ‘’.

  • syntax (str, optional) – The SQL syntax to use (‘tsql’ or ‘sqlite’). Defaults to ‘tsql’.

  • target_naturalness (str, optional) – The target naturalness level. Defaults to “native”.

Returns:

The denaturalized query.

Return type:

str

log_attempt(prompt, response, result_df, database, model_name, naturalness={'table': 0, 'column': 0}, denaturalized_response=None)

Logs the attempt to a file.

Parameters:
  • prompt (str) – The prompt used.

  • response (str) – The response received.

  • result_df (pandas.DataFrame) – The result dataframe.

  • database (str) – The database used.

  • model_name (str) – The model name used.

  • naturalness (dict, optional) – The naturalness level used. Defaults to {‘table’: 0, ‘column’: 0}.

  • denaturalized_response (str, optional) – The denaturalized response. Defaults to None.

denaturalize_query_test()

Test function for denaturalize_query.

do_single_question_test()

Test function for do_single_question.