.. _notebooks: SNAILS Reproducibility Notebooks ================================ The SNAILS repository contains 10 numbered Jupyter notebooks in the root of the project. Notebooks 1 - 3 must be run in-sequence. Notebooks 4 - 10 may be run in any order. Before running any notebooks, you must install dependences (:ref:`installing`) and instantiate the SNAILS databases (:ref:`databases`). 1. `Run NL-to-SQL Inference and Auto Scoring `_: Generates and evaluates SQL queries for both execution accuracy and schema linking performance. Outputs to ``./data/nl-to-sql-performance_annotations/pending_evaluation/``. NOTE: After running notebook 1 and prior to running notebook 2 analysis, manually review the generated SQL using the manual validation tool (:ref:`manual_validation`). 2. `Run Statistical Tests and Create Charts `_: Loads validated performance annotations from ``./data/nl-to-sql-performance_annotations/`` and generates statistical tests and charts. 3. `Run Identifier-Focused Analysis `_: Loads validated performance annotations from ``./data/nl-to-sql-performance_annotations/`` and provides an identifier-focused schema linking performance metric. 4. `Tokenizer Analysis `_: Tokenizes SNAILS identifiers and explores their properties. 5. `Token Naturalness Analysis `_: Explores the alignment of tokens to natural language. 6. `Naturalness Comparisons `_: Compares naturalness of SNAILS, Spider, Bird, and SchemaPile. 7. `SchemaPile Naturalness `_: ETL scripts for SchemaPile extraction and evaluation. 8. `CodeS Query Execution and Selection `_: Augments CodeS process to select the first correct SQL. 9. `DINSQL CodeS Schema Subsetting Analysis `_: Evaluates schema subsets generated by CodeS and DINSQL. 10. `Spider Query Analysis `_: Creates performance metrics for Spider DEV (Native and modified) inference. .. toctree:: :maxdepth: 2 :caption: Contents: