end_to_end_data_prep_and_prediction

main(model: str, service: str, naturalness: str, database: str, bypass_nl_sql_inference: bool = True, db_list_file: str = '.local/dbinfo.json')

Executes the main logic for evaluating NL-to-SQL performance.

Parameters:
  • model – The name of the NL-to-SQL model.

  • service – The name of the service providing the model.

  • naturalness – The naturalness level of the schema (e.g., “NATIVE”, “N1”, “N2”, “N3”).

  • database – The name of the database to use.

  • bypass_nl_sql_inference – Whether to bypass NL-to-SQL inference and load predictions from a file.

  • db_list_file – Path to the database information JSON file.

Raises:

FileNotFoundError – If bypass_nl_sql_inference is True and the predicted queries file is not found.

mp_query_parse_function(query_data: tuple) tuple

Parses a SQL query using an external Java tool and returns its statistics.

Parameters:

query_data – A tuple containing the query number, the SQL query string, and the SQL dialect.

Returns:

A tuple containing the query number and a dictionary of query statistics.

mp_schema_linking_eval(data: tuple) tuple

Evaluates schema linking by comparing gold and predicted queries.

Parameters:

data – A tuple containing the query number, the gold query, and the predicted query.

Returns:

A tuple containing the query number and a dictionary of schema linking evaluation results.

nl_to_sql_generation(q_nl_df: pd.DataFrame, bypass: bool = False, naturalness: str = None, db_name: str = None, config_dict: dict = None, nat_cat_dict: dict = None, db_info: dict = None, db_list_file: str = '.local/dbinfo.json', db_util=src.util.db_util) pd.DataFrame

Generates SQL queries from natural language questions.

Parameters:
  • q_nl_df – DataFrame containing natural language questions and other information.

  • bypass – Whether to bypass NL-to-SQL generation and load from file.

  • naturalness – Naturalness level for schema elements.

  • db_name – Name of the database.

  • config_dict – Configuration dictionary.

  • nat_cat_dict – Dictionary mapping naturalness levels to numeric values.

  • db_info – Database information dictionary.

  • db_list_file – Path to database information JSON file.

  • db_util – Database utility module.

Returns:

DataFrame with generated SQL queries.