db_schema_classifier¶
- classify_db_schema(db_name, tagged=True, db_type='ms sql')¶
Classifies the naturalness of table and column names in a database schema. This relies on the SNAILS db_util.py or db_util_sqlite.py utilities which means that databases must be registered in the ./.local/dbinfo.json or dbinfo_sqlite.json files prior to use.
- Parameters:
db_name (str) – The name of the database to classify.
tagged (bool) – Whether to tag the identifiers.
db_type (str) – The type of the database, either ‘ms sql’ or ‘sqlite’.
- Raises:
ModuleNotFoundError – If the snails_naturalness_classifier module is not found.
- Returns:
A DataFrame containing the table names, table scores, column names, column scores, and model used.
- Return type:
pd.DataFrame
- classify_batch_with_canine(batch_filepath)¶
Classifies tables and columns in a batch file using the CanineIdentifierClassifier.
- Parameters:
batch_filepath (str) – The file path to the batch CSV file containing table and column names.
- Returns:
A DataFrame containing the table names, table scores, column names, column scores, and the model used for classification. If the input DataFrame contains a ‘DATABASE_NAME’ column, it will also be included in the output DataFrame.
- Return type:
pd.DataFrame
- main()¶
Main function to demonstrate the usage of the schema classification functions. This function sets a database name, then calls the
classify_db_schema
function. Commented-out lines show alternative usage scenarios.