db_util

db_util.do_query(query: str, database_name: str, debug: bool = False, convert_bytes_to_str: bool = False, db_list_file: str = '.local/dbinfo.json') pd.DataFrame

Executes a SQL query on a specified database and returns the result as a pandas DataFrame.

Parameters:
  • query (str) – The SQL query to execute.

  • database_name (str) – The name of the database to connect to.

  • debug (bool) – If True, prints debug information. Defaults to False.

  • convert_bytes_to_str (bool) – If True, converts byte values to strings using cp437 encoding. Defaults to False.

  • db_list_file (str) – The path to the database list file. Defaults to “.local/dbinfo.json”.

Returns:

A DataFrame containing the query results. If the query returns no results, an empty DataFrame is returned.

Return type:

pd.DataFrame

db_util.connect_to_db(database_name: str, db_list_file: str = '.local/dbinfo.json') pyodbc.Connection

Establishes a connection to the specified database.

Parameters:
  • database_name (str) – The name of the database to connect to.

  • db_list_file (str) – The path to the JSON file containing database connection information. Defaults to ‘.local/dbinfo.json’.

Returns:

A connection object to the specified database.

Return type:

pyodbc.Connection

Raises:
  • FileNotFoundError – If the db_list_file does not exist.

  • KeyError – If the database information is not found in the db_list_file.

  • pyodbc.Error – If there is an error connecting to the database.

db_util.get_tables_and_columns_from_sql_server_db(db_name: str, append_col_types: bool = True, table_list: list = None, uppercase: bool = False, schema: str = 'dbo') dict

Returns a dictionary of tables and columns from a SQL Server database.

Parameters:
  • db_name (str) – The name of the database to use for the schema.

  • append_col_types (bool) – Whether to append the column types to the column names. Defaults to True.

  • table_list (list or None) – A list of tables to limit the request to. Defaults to None.

  • uppercase (bool) – Whether to return table and column names in uppercase. Defaults to False.

  • schema (str) – The database schema to query. Defaults to “dbo”.

Returns:

A dictionary where keys are table names and values are lists of column names.

Return type:

dict

db_util.get_tables_and_columns_df(tables_and_col_dict: dict) pd.DataFrame

Converts a dictionary of tables and their columns into a pandas DataFrame.

Parameters:

tables_and_col_dict (dict) – A dictionary where keys are table names and values are lists of column names.

Returns:

A DataFrame with two columns: ‘table’ and ‘column’.

Return type:

pd.DataFrame

db_util.make_db_schema_prompt(db_name: str, db_type: str = 'MS SQL Server', task: str = 'query', table_list: list = None, column_list: list = None, identifier_tags: bool = False, schema: str = 'dbo', table_col_function=get_tables_and_columns_from_sql_server_db)

Creates a zero-shot prompt with schema knowledge for an LLM query.

Parameters:
  • db_name (str) – The name of the database to use for the schema.

  • db_type (str) – The type of database. Defaults to ‘MS SQL Server’.

  • task (str) – The task to be performed (‘query’ or ‘tables’). Defaults to ‘query’.

  • table_list (list or None) – A list of tables to include in the prompt. Defaults to None.

  • column_list (list or None) – A list of columns to include in the prompt. Defaults to None.

  • identifier_tags (bool) – Whether to include tags around table and column names. Defaults to False.

  • schema (str) – The database schema to query. Defaults to “dbo”.

  • table_col_function (function) – The function to use for retrieving table and column information. Defaults to get_tables_and_columns_from_sql_server_db().

Returns:

The prompt to be used for the LLM query.

Return type:

str

db_util.get_table_cardinality(db_name, table_name) int

Retrieves the number of rows in a specified table.

Parameters:
  • db_name (str) – The name of the database.

  • table_name (str) – The name of the table.

Returns:

The number of rows in the table.

Return type:

int

db_util.get_db_tables_cardinality(db_name, save_to_temp_folder=False) pd.DataFrame

Retrieves the cardinality for each table in the specified database.

Parameters:
  • db_name (str) – The name of the database.

  • save_to_temp_folder (bool) – If True, saves the resulting DataFrame to a CSV file. Defaults to False.

Returns:

A DataFrame containing table names and their cardinalities.

Return type:

pd.DataFrame

Raises:

Exception – If there is an error retrieving the cardinality for a table.