db_schema_classifier

classify_db_schema(db_name, tagged=True, db_type='ms sql')

Classifies the naturalness of table and column names in a database schema. This relies on the SNAILS db_util.py or db_util_sqlite.py utilities which means that databases must be registered in the ./.local/dbinfo.json or dbinfo_sqlite.json files prior to use.

Parameters:
  • db_name (str) – The name of the database to classify.

  • tagged (bool) – Whether to tag the identifiers.

  • db_type (str) – The type of the database, either ‘ms sql’ or ‘sqlite’.

Raises:

ModuleNotFoundError – If the snails_naturalness_classifier module is not found.

Returns:

A DataFrame containing the table names, table scores, column names, column scores, and model used.

Return type:

pd.DataFrame

classify_batch_with_canine(batch_filepath)

Classifies tables and columns in a batch file using the CanineIdentifierClassifier.

Parameters:

batch_filepath (str) – The file path to the batch CSV file containing table and column names.

Returns:

A DataFrame containing the table names, table scores, column names, column scores, and the model used for classification. If the input DataFrame contains a ‘DATABASE_NAME’ column, it will also be included in the output DataFrame.

Return type:

pd.DataFrame

main()

Main function to demonstrate the usage of the schema classification functions. This function sets a database name, then calls the classify_db_schema function. Commented-out lines show alternative usage scenarios.