db_util¶
- db_util.do_query(query: str, database_name: str, debug: bool = False, convert_bytes_to_str: bool = False, db_list_file: str = '.local/dbinfo.json') pd.DataFrame ¶
Executes a SQL query on a specified database and returns the result as a pandas DataFrame.
- Parameters:
query (str) – The SQL query to execute.
database_name (str) – The name of the database to connect to.
debug (bool) – If True, prints debug information. Defaults to False.
convert_bytes_to_str (bool) – If True, converts byte values to strings using cp437 encoding. Defaults to False.
db_list_file (str) – The path to the database list file. Defaults to “.local/dbinfo.json”.
- Returns:
A DataFrame containing the query results. If the query returns no results, an empty DataFrame is returned.
- Return type:
pd.DataFrame
- db_util.connect_to_db(database_name: str, db_list_file: str = '.local/dbinfo.json') pyodbc.Connection ¶
Establishes a connection to the specified database.
- Parameters:
database_name (str) – The name of the database to connect to.
db_list_file (str) – The path to the JSON file containing database connection information. Defaults to ‘.local/dbinfo.json’.
- Returns:
A connection object to the specified database.
- Return type:
pyodbc.Connection
- Raises:
FileNotFoundError – If the db_list_file does not exist.
KeyError – If the database information is not found in the db_list_file.
pyodbc.Error – If there is an error connecting to the database.
- db_util.get_tables_and_columns_from_sql_server_db(db_name: str, append_col_types: bool = True, table_list: list = None, uppercase: bool = False, schema: str = 'dbo') dict ¶
Returns a dictionary of tables and columns from a SQL Server database.
- Parameters:
db_name (str) – The name of the database to use for the schema.
append_col_types (bool) – Whether to append the column types to the column names. Defaults to True.
table_list (list or None) – A list of tables to limit the request to. Defaults to None.
uppercase (bool) – Whether to return table and column names in uppercase. Defaults to False.
schema (str) – The database schema to query. Defaults to “dbo”.
- Returns:
A dictionary where keys are table names and values are lists of column names.
- Return type:
dict
- db_util.get_tables_and_columns_df(tables_and_col_dict: dict) pd.DataFrame ¶
Converts a dictionary of tables and their columns into a pandas DataFrame.
- Parameters:
tables_and_col_dict (dict) – A dictionary where keys are table names and values are lists of column names.
- Returns:
A DataFrame with two columns: ‘table’ and ‘column’.
- Return type:
pd.DataFrame
- db_util.make_db_schema_prompt(db_name: str, db_type: str = 'MS SQL Server', task: str = 'query', table_list: list = None, column_list: list = None, identifier_tags: bool = False, schema: str = 'dbo', table_col_function=get_tables_and_columns_from_sql_server_db)¶
Creates a zero-shot prompt with schema knowledge for an LLM query.
- Parameters:
db_name (str) – The name of the database to use for the schema.
db_type (str) – The type of database. Defaults to ‘MS SQL Server’.
task (str) – The task to be performed (‘query’ or ‘tables’). Defaults to ‘query’.
table_list (list or None) – A list of tables to include in the prompt. Defaults to None.
column_list (list or None) – A list of columns to include in the prompt. Defaults to None.
identifier_tags (bool) – Whether to include tags around table and column names. Defaults to False.
schema (str) – The database schema to query. Defaults to “dbo”.
table_col_function (function) – The function to use for retrieving table and column information. Defaults to
get_tables_and_columns_from_sql_server_db()
.
- Returns:
The prompt to be used for the LLM query.
- Return type:
str
- db_util.get_table_cardinality(db_name, table_name) int ¶
Retrieves the number of rows in a specified table.
- Parameters:
db_name (str) – The name of the database.
table_name (str) – The name of the table.
- Returns:
The number of rows in the table.
- Return type:
int
- db_util.get_db_tables_cardinality(db_name, save_to_temp_folder=False) pd.DataFrame ¶
Retrieves the cardinality for each table in the specified database.
- Parameters:
db_name (str) – The name of the database.
save_to_temp_folder (bool) – If True, saves the resulting DataFrame to a CSV file. Defaults to False.
- Returns:
A DataFrame containing table names and their cardinalities.
- Return type:
pd.DataFrame
- Raises:
Exception – If there is an error retrieving the cardinality for a table.