db_schema_classifier¶

classify_db_schema(db_name, tagged=True, db_type='ms sql')¶

Classifies the naturalness of table and column names in a database schema. This relies on the SNAILS db_util.py or db_util_sqlite.py utilities which means that databases must be registered in the ./.local/dbinfo.json or dbinfo_sqlite.json files prior to use.

Parameters:

db_name (str) – The name of the database to classify.
tagged (bool) – Whether to tag the identifiers.
db_type (str) – The type of the database, either ‘ms sql’ or ‘sqlite’.

Raises:

ModuleNotFoundError – If the snails_naturalness_classifier module is not found.

Returns:

A DataFrame containing the table names, table scores, column names, column scores, and model used.

Return type:

pd.DataFrame

classify_batch_with_canine(batch_filepath)¶

Classifies tables and columns in a batch file using the CanineIdentifierClassifier.

Parameters:: batch_filepath (str) – The file path to the batch CSV file containing table and column names.
Returns:: A DataFrame containing the table names, table scores, column names, column scores, and the model used for classification. If the input DataFrame contains a ‘DATABASE_NAME’ column, it will also be included in the output DataFrame.
Return type:: pd.DataFrame

main()¶: Main function to demonstrate the usage of the schema classification functions. This function sets a database name, then calls the classify_db_schema function. Commented-out lines show alternative usage scenarios.

db_schema_classifier¶

SNAILS

Navigation

Related Topics