schemarenamer

schemarenamer.main()

Executes the main logic of the script. This includes classifying the database schema, performing schema renaming, and saving the results to an Excel file.

no-index:

schemarenamer.transform_score_df(score_df: pandas.DataFrame) pandas.DataFrame

Transforms the input score DataFrame to combine table and column scores, lowercase identifiers, and remove duplicates.

Parameters:

score_df (pandas.DataFrame) – DataFrame containing table and column scores.

Returns:

Transformed DataFrame with combined scores.

Return type:

pandas.DataFrame

schemarenamer.do_schema_renaming(database_name='PacificIslandLandbirds', score_lookup_file='./data/gold-data/identifier-scores-evaluated-5-9-2024.xlsx', continuous_write=False, db_type='ms sql', db_classifier_score_df=None, only_most_natural=False, verbose=True) pandas.DataFrame

Renames schema identifiers (tables and columns) in a database based on human-evaluated naturalness scores.

Parameters:
  • database_name (str) – The name of the database. Defaults to “PacificIslandLandbirds”.

  • score_lookup_file (str) – Path to the Excel file containing human-evaluated scores. Defaults to “./data/gold-data/identifier-scores-evaluated-5-9-2024.xlsx”.

  • continuous_write (bool) – If True, writes logs continuously. Defaults to False.

  • db_type (str) – The type of the database (“ms sql” or “sqlite”). Defaults to “ms sql”.

  • db_classifier_score_df (pandas.DataFrame or None) – DataFrame with classifier scores. If None, reads from score_lookup_file. Defaults to None.

  • only_most_natural (bool) – If True, only generates the most natural identifier. Defaults to False.

  • verbose (bool) – If True, prints progress information. Defaults to True.

Returns:

DataFrame containing original and generated identifiers with scores and errors.

Return type:

pandas.DataFrame

: .. py:function:: do_fewshot_identifier_transform(identifier, naturalness, data_dict_interpreter=None, only_most_natural=False, verbose=True, gpt_model=”gpt-4o”) -> dict

Transforms a given identifier to different naturalness levels using few-shot prompting and a data dictionary interpreter.

param identifier:

The identifier to transform.

type identifier:

str

param naturalness:

The original naturalness level of the identifier (e.g., “N1”, “N2”, “N3”).

type naturalness:

str

param data_dict_interpreter:

An instance of DataDictInterpreter for retrieving natural identifiers. Defaults to None.

type data_dict_interpreter:

SNAILS_Artifacts.naturalness_modifier .data_dict_reader.DataDictInterpreter or None

param only_most_natural:

If True, only generates the most natural identifier. Defaults to False.

type only_most_natural:

bool

param verbose:

If True, prints progress information. Defaults to True.

type verbose:

bool

param gpt_model:

The GPT model to use. Defaults to “gpt-4o”.

type gpt_model:

str

return:

A dictionary containing the transformed identifiers for different naturalness levels.

rtype:

dict