Skip to content

SDE-SQL: Enhancing Text-to-SQL Generation in Large Language Models via Self-Driven Exploration with SQL Probes

Notifications You must be signed in to change notification settings

MKcodeshere/Text-to-SQL---Using-SDE-SQL-Method

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

SDE-SQL Prototype

Natural language to SQL interface using Self-Driven Exploration with SQL Probes. Query PostgreSQL databases using plain English with real-time exploration visualization.

Overview

This application implements the SDE-SQL methodology from the paper "Enhancing Text-to-SQL Generation in Large Language Models via Self-Driven Exploration with SQL Probes" (arXiv:2506.07245v2). It features:

  • Natural language to SQL conversion using GPT-4o
  • Real-time exploration traces showing how SQL is constructed
  • DuckDB WASM for client-side SQL execution
  • Interactive chat interface with query results visualization

Prerequisites

Before you start, ensure you have:

  • Python 3.9+ installed
  • Node.js 18+ and npm installed
  • PostgreSQL running locally (localhost:5432)
  • DVD Rental Database loaded in PostgreSQL
  • OpenAI API Key (for GPT-4o)

Quick Start

1. Clone and Navigate

cd sde-sql-prototype

2. Setup Backend

Create Environment File

cd backend
copy .env.example .env

Edit .env with your credentials:

OPENAI_API_KEY=sk-your-actual-api-key-here
DATABASE_HOST=localhost
DATABASE_PORT=5432
DATABASE_NAME=dvdrental
DATABASE_USER=postgres
DATABASE_PASSWORD=your_postgres_password

Start Backend (Windows)

start.bat

The script will:

  • Create Python virtual environment
  • Install dependencies from requirements.txt
  • Run infrastructure tests
  • Start FastAPI server on http://localhost:8000

Start Backend (Linux/Mac)

./start.sh

3. Setup Frontend

Open a new terminal:

cd frontend

Start Frontend (Windows)

start.bat

The script will:

Start Frontend (Linux/Mac)

npm install
npm run dev

4. Access Application

Open your browser and navigate to:

Manual Setup

Backend Setup (Detailed)

cd backend

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure environment variables
copy .env.example .env
# Edit .env with your settings

# Test infrastructure
python test_infrastructure.py

# Start server
python main.py

Frontend Setup (Detailed)

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production (optional)
npm run build

Project Structure

sde-sql-prototype/
├── backend/
│   ├── main.py                 # FastAPI entry point
│   ├── start.bat              # Windows startup script
│   ├── start.sh               # Linux/Mac startup script
│   ├── requirements.txt       # Python dependencies
│   ├── .env.example          # Environment template
│   ├── config/               # Configuration & prompts
│   ├── core/                 # Pipeline orchestration
│   ├── modules/              # SDE-SQL modules
│   ├── database/             # Database connectors
│   ├── utils/                # Utilities (LSH, embeddings)
│   └── api/                  # REST & WebSocket endpoints
│
├── frontend/
│   ├── package.json          # Node dependencies
│   ├── start.bat            # Windows startup script
│   ├── vite.config.ts       # Vite configuration
│   └── src/
│       ├── App.tsx          # Main application
│       ├── components/      # React components
│       ├── services/        # API & WebSocket clients
│       └── types/           # TypeScript types
│
└── README.md                # This file

Database Setup

Install DVD Rental Database

  1. Download the database:
wget https://www.postgresqltutorial.com/wp-content/uploads/2019/05/dvdrental.zip
unzip dvdrental.zip
  1. Create and restore:
# Connect to PostgreSQL
psql -U postgres

# Create database
CREATE DATABASE dvdrental;
\q

# Restore backup
pg_restore -U postgres -d dvdrental dvdrental.tar
  1. Verify:
psql -U postgres -d dvdrental -c "\dt"

You should see tables: customer, film, actor, rental, etc.

Testing

Backend Tests

cd backend
python test_infrastructure.py

Expected output: All 15 infrastructure tests should pass.

Frontend Tests

cd frontend
npm run lint

Usage

  1. Start both backend and frontend using the startup scripts
  2. Open http://localhost:5173 in your browser
  3. Type a natural language question, e.g., "Find all customers from Argentina"
  4. Watch the real-time exploration traces on the right panel
  5. View the generated SQL and query results in the chat interface

Example Queries

Try these with the DVD Rental database:

  • "Find all customers from Argentina"
  • "What are the top 5 most rented films?"
  • "List all actors who appeared in films rented by customer ID 100"
  • "Show me the total revenue by store"
  • "Which films have never been rented?"

API Endpoints

REST API

  • POST /api/query - Submit natural language query
  • GET /api/databases - List available databases
  • GET /api/session/{session_id} - Get session results

WebSocket

  • ws://localhost:8000/ws/{session_id} - Real-time trace updates

See full API documentation at http://localhost:8000/docs

Environment Variables

Backend (.env)

Variable Description Default
OPENAI_API_KEY OpenAI API key Required
OPENAI_MODEL Model to use gpt-4o
DATABASE_HOST PostgreSQL host localhost
DATABASE_PORT PostgreSQL port 5432
DATABASE_NAME Database name dvdrental
DATABASE_USER Database user postgres
DATABASE_PASSWORD Database password Required
API_HOST API server host 0.0.0.0
API_PORT API server port 8000

Frontend (.env)

Variable Description Default
VITE_API_URL Backend API URL http://localhost:8000
VITE_WS_URL WebSocket URL ws://localhost:8000

Troubleshooting

Backend Issues

Database Connection Failed

  • Check PostgreSQL is running
  • Verify credentials in .env
  • Test connection: psql -U postgres -d dvdrental

OpenAI API Error

Module Not Found

  • Activate virtual environment: venv\Scripts\activate
  • Reinstall: pip install -r requirements.txt

Frontend Issues

Port 5173 Already in Use

  • Stop other Vite instances
  • Or edit vite.config.ts to use different port

Cannot Connect to Backend

  • Ensure backend is running on port 8000
  • Check CORS settings in backend .env

Technology Stack

Backend

  • FastAPI - Web framework
  • LangChain - LLM orchestration
  • GPT-4o - Language model
  • PostgreSQL - Database
  • SQLGlot - SQL parsing
  • asyncpg - Async database driver

Frontend

  • React 19 - UI framework
  • TypeScript - Type safety
  • Vite - Build tool
  • Axios - HTTP client
  • Prism.js - Syntax highlighting

Contributing

This is a research prototype. For issues or improvements:

  1. Check existing documentation
  2. Review the paper (arXiv:2506.07245v2)
  3. Test with the DVD Rental database first

License

Research prototype based on the SDE-SQL paper.

References

Support

For detailed setup instructions, see:

  • SETUP_INSTRUCTIONS.md - Infrastructure setup
  • backend/README.md - Backend details
  • frontend/README.md - Frontend details

Last Updated: 2025-10-12

About

SDE-SQL: Enhancing Text-to-SQL Generation in Large Language Models via Self-Driven Exploration with SQL Probes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors