Llama-3.2-3B β€” Text-to-SQL LoRA Adapter

A LoRA adapter that teaches Llama-3.2-3B-Instruct to generate SQL queries from natural-language questions against a given table schema.

Fine-tuned on b-mc2/sql-create-context with Unsloth on a free Google Colab T4 GPU.

What it does

Input: SQL schema (CREATE TABLE ...) + question in plain English Output: SQL query that answers the question

Examples

Example 1 β€” JOIN + aggregation + sorting + limit

Schema:

CREATE TABLE orders (id INT, user_id INT, amount DECIMAL, created_at DATE);
CREATE TABLE users (id INT, name VARCHAR, country VARCHAR);

Question: What are top 5 users by total order amount in 2025?

Model output:

SELECT T1.id, T1.name, T1.country, SUM(T2.amount) AS total_amount
FROM users AS T1 JOIN orders AS T2 ON T1.id = T2.user_id
WHERE T2.created_at >= 2025
GROUP BY T1.id ORDER BY total_amount DESC LIMIT 5

Example 2 β€” AVG with GROUP BY

Schema: CREATE TABLE employees (id INT, name VARCHAR, department VARCHAR, salary INT, hire_date DATE) Question: Show average salary by department, sorted from highest to lowest

SELECT AVG(salary), department FROM employees
GROUP BY department ORDER BY AVG(salary) DESC

Example 3 β€” WHERE with multiple conditions

Schema: CREATE TABLE products (id INT, name VARCHAR, category VARCHAR, price DECIMAL, stock INT) Question: Find all products with stock less than 10 in the 'electronics' category

SELECT * FROM products WHERE category = "electronics" AND stock < 10

Example 4 β€” COUNT with JOIN

Schema:

CREATE TABLE customers (id INT, name VARCHAR, country VARCHAR);
CREATE TABLE orders (id INT, customer_id INT, total DECIMAL);

Question: Count how many orders each customer from Ukraine has made

SELECT COUNT(*) FROM orders AS T1 JOIN customers AS T2
ON T1.customer_id = T2.id WHERE T2.country = "Ukraine"

Training details

Parameter Value
Base model unsloth/Llama-3.2-3B-Instruct
Method LoRA (PEFT) via Unsloth
Dataset b-mc2/sql-create-context β€” 2,000 examples (random subset, seed=42)
Trainable params 24.3M / 3.24B (0.75%)
LoRA config r=16, alpha=16, dropout=0, target modules: Q/K/V/O + gate/up/down
Loss masking train_on_responses_only β€” only SQL tokens contribute to loss
Optimizer adamw_8bit, lr=2e-4, linear schedule, warmup 5 steps
Batch per-device 2 Γ— grad-accum 4 = effective 8
Steps 250 (= 1 epoch on 2,000 examples)
Hardware Google Colab T4 (free) β€” 6.5 GB peak GPU memory (44% of 15 GB)
Training time 7.44 minutes
Loss trajectory 0.55 β†’ 0.04 (final)

Evaluation

Evaluated on 100 held-out examples from b-mc2/sql-create-context (indices 2000–2100, unseen during training).

Metric Value
Exact-match accuracy (normalized: lowercase, whitespace collapsed, quotes normalized) 76%

Notes on the failure mode

Manual analysis of the 24 "failures" shows that roughly half are semantically equivalent SQL that exact-match rejects:

Type of "failure" Example
Strings vs numbers WHERE runs = 144 vs reference WHERE runs = "144"
Alias differences SELECT Prime vs reference SELECT Prime AS minister
Ambiguous reference (dataset has minor errors) SELECT SUM(poles) vs reference SELECT MIN(poles)
Genuine model errors (missing GROUP BY, wrong aggregation) ~10 out of 100

A proper execution accuracy evaluation (running predictions against a real database) would likely show 85–90% correctness. LLM-as-judge with Claude/GPT-4 is the next planned improvement.

Known limitations

  • English only. Trained on English questions only.
  • ANSI SQL. No specific dialect (PostgreSQL / MySQL / Oracle).
  • Naive date comparisons can appear (e.g. created_at >= 2025 instead of created_at >= '2025-01-01').
  • Implicit GROUP BY is occasionally missed when a question implies "per each X" without saying it.
  • Trained on the original dataset distribution β€” complex 3+ table joins or window functions are out of distribution.

How to use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = "unsloth/Llama-3.2-3B-Instruct"
adapter = "notingemiu/llama-3.2-3b-text2sql-lora"

tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

schema = "CREATE TABLE orders (id INT, total DECIMAL, created_at DATE)"
question = "What is total revenue in 2024?"

messages = [
    {"role": "system", "content": f"You are a SQL expert. Use this schema:\n{schema}"},
    {"role": "user", "content": question},
]
inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(input_ids=inputs, max_new_tokens=200, temperature=0.1, do_sample=True)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Author

Built as part of a portfolio for AI/LLM Engineer roles. Comments and feedback welcome.

Downloads last month
46
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for notingemiu/llama-3.2-3b-text2sql-lora

Adapter
(411)
this model

Dataset used to train notingemiu/llama-3.2-3b-text2sql-lora