db_util¶

db_util.do_query(query: str, database_name: str, debug: bool = False, convert_bytes_to_str: bool = False, db_list_file: str = '.local/dbinfo.json') → pd.DataFrame¶

Executes a SQL query on a specified database and returns the result as a pandas DataFrame.

Parameters:

query (str) – The SQL query to execute.
database_name (str) – The name of the database to connect to.
debug (bool) – If True, prints debug information. Defaults to False.
convert_bytes_to_str (bool) – If True, converts byte values to strings using cp437 encoding. Defaults to False.
db_list_file (str) – The path to the database list file. Defaults to “.local/dbinfo.json”.

Returns:

A DataFrame containing the query results. If the query returns no results, an empty DataFrame is returned.

Return type:

pd.DataFrame

db_util.connect_to_db(database_name: str, db_list_file: str = '.local/dbinfo.json') → pyodbc.Connection¶

Establishes a connection to the specified database.

Parameters:

database_name (str) – The name of the database to connect to.
db_list_file (str) – The path to the JSON file containing database connection information. Defaults to ‘.local/dbinfo.json’.

Returns:

A connection object to the specified database.

Return type:

pyodbc.Connection

Raises:

FileNotFoundError – If the db_list_file does not exist.
KeyError – If the database information is not found in the db_list_file.
pyodbc.Error – If there is an error connecting to the database.

db_util.get_tables_and_columns_from_sql_server_db(db_name: str, append_col_types: bool = True, table_list: list = None, uppercase: bool = False, schema: str = 'dbo') → dict¶

Returns a dictionary of tables and columns from a SQL Server database.

Parameters:

db_name (str) – The name of the database to use for the schema.
append_col_types (bool) – Whether to append the column types to the column names. Defaults to True.
table_list (list or None) – A list of tables to limit the request to. Defaults to None.
uppercase (bool) – Whether to return table and column names in uppercase. Defaults to False.
schema (str) – The database schema to query. Defaults to “dbo”.

Returns:

A dictionary where keys are table names and values are lists of column names.

Return type:

dict

db_util.get_tables_and_columns_df(tables_and_col_dict: dict) → pd.DataFrame¶

Converts a dictionary of tables and their columns into a pandas DataFrame.

Parameters:: tables_and_col_dict (dict) – A dictionary where keys are table names and values are lists of column names.
Returns:: A DataFrame with two columns: ‘table’ and ‘column’.
Return type:: pd.DataFrame

db_util.make_db_schema_prompt(db_name: str, db_type: str = 'MS SQL Server', task: str = 'query', table_list: list = None, column_list: list = None, identifier_tags: bool = False, schema: str = 'dbo', table_col_function=get_tables_and_columns_from_sql_server_db)¶

Creates a zero-shot prompt with schema knowledge for an LLM query.

Parameters:

db_name (str) – The name of the database to use for the schema.
db_type (str) – The type of database. Defaults to ‘MS SQL Server’.
task (str) – The task to be performed (‘query’ or ‘tables’). Defaults to ‘query’.
table_list (list or None) – A list of tables to include in the prompt. Defaults to None.
column_list (list or None) – A list of columns to include in the prompt. Defaults to None.
identifier_tags (bool) – Whether to include tags around table and column names. Defaults to False.
schema (str) – The database schema to query. Defaults to “dbo”.
table_col_function (function) – The function to use for retrieving table and column information. Defaults to get_tables_and_columns_from_sql_server_db().

Returns:

The prompt to be used for the LLM query.

Return type:

str

db_util.get_table_cardinality(db_name, table_name) → int¶

Retrieves the number of rows in a specified table.

Parameters:

db_name (str) – The name of the database.
table_name (str) – The name of the table.

Returns:

The number of rows in the table.

Return type:

int

db_util.get_db_tables_cardinality(db_name, save_to_temp_folder=False) → pd.DataFrame¶

Retrieves the cardinality for each table in the specified database.

Parameters:

db_name (str) – The name of the database.
save_to_temp_folder (bool) – If True, saves the resulting DataFrame to a CSV file. Defaults to False.

Returns:

A DataFrame containing table names and their cardinalities.

Return type:

pd.DataFrame

Raises:

Exception – If there is an error retrieving the cardinality for a table.

db_util¶

SNAILS

Navigation

Related Topics