JaneGPT v2 Janus — Intent Classification Model


Total Downloads Monthly Downloads


Experience Janus

Jane Janus animated hero banner

Hierarchical command understanding with state-aware runtime behavior for practical assistant workflows.

7.95M
Parameters
82
Runtime Turns
0
Errors
25.3 ms
Mean Latency
100%
OOD Precision
30.6 MB
Checkpoint

🏛️ The Temple of Janus (Web Experience)

We have deployed a dedicated interactive environment to showcase the essence of JaneGPT-v2 Janus.

Note: This is a visual and technical walkthrough; it does not feature a live chat interface.


Quickstart (2 minutes)

Install + first prediction
pip install -r requirements.txt
from janegpt_v2_janus.inference import JaneGPTv3NLU

nlu = JaneGPTv3NLU(
    model_path="weights/janegpt_v2_janus.pt",
    tokenizer_path="weights/tokenizer.json",
)

state = {}
result = nlu.predict("set volume", state=state)
print(result)

if result.get("type") == "command":
    state = nlu.update_state(result, state)
Runtime wrapper (recommended for assistant flows)
from runtime.jane_nlu_runtime import JaneNLURuntime

rt = JaneNLURuntime(base_dir=".")
state = {}

out, state = rt.handle_turn("set volume", state)
print(out)  # expected: clarify prompt for missing VALUE

out, state = rt.handle_turn("55", state)
print(out)  # expected: resolved local command
Run bundled demos
python examples/demo_inference.py
python examples/demo_runtime.py
python examples/demo_runtime_suite.py

What You Get

  • Single-pass multitask prediction: domain + action + BIO slots.
  • Runtime-safe clarification loops for missing required slots.
  • Stateful follow-ups (for example, "that is not enough" after a volume change).
  • Local command routing with controlled chat fallback.
  • Compact deployment footprint: ~30.62 MB checkpoint.

Model Architecture

Interactive Architecture Visualization

🔤 1. Tokenization & Embedding Layer
Input text is converted to token IDs and projected into a 256-dimensional embedding space.
Tokenizer BPE, vocab=8,192
Max Length 96 tokens
Output Shape (batch, 96, 256)
Embedding 8192 → 256 dim
2. Transformer Backbone (8 Blocks)
Bidirectional attention layers with residual connections. Each block processes hidden states through grouped query attention and feed-forward networks.
Attention Type Grouped Query (GQA)
Query Heads 8 heads
KV Heads 4 heads (2:1)
Head Dimension 32 (256÷8)
Position Embedding RoPE
FFN Expansion 256 → 672 → 256
FFN Activation SwiGLU
Normalization RMSNorm
Causal Masking OFF (bidirectional)
Dropout Rate 0.1
Grouped Query Attention reduces KV cache 50% while maintaining quality
3. Multi-Task Prediction Heads (Parallel)
Three independent classification heads process the backbone output simultaneously for domain, action, and slot predictions.
Domain Head
Input: Last token (pooled)
Arch: Linear(256) → GELU → Dropout → Linear(10)
Output: 10 classes
Action Head
Input: Last token (pooled)
Arch: Linear(256) → GELU → Dropout → Linear(33)
Output: 33 classes
Slot Head
Input: All tokens
Arch: Linear(256) → Linear(15 BIO)
Output: 15 labels/token
4. Output & Post-Processing
Raw logits are converted to predictions. For slots, BIO tags are decoded into semantic spans.
Domain Output 10 classes
Action Output 33 classes
Slots Decoder BIO → Spans
Confidence Softmax scores

Training Objective

Weighted Multi-Task Loss
loss = 1.0 × L_domain + 1.0 × L_action + 1.5 × L_slots

Where: L_domain = CrossEntropy(domain_logits, domain_labels) L_action = CrossEntropy(action_logits, action_labels) L_slots = CrossEntropy(slot_logits, slot_labels) with ignore_index=-100 (padding)

Slot weight (1.5×) reflects higher complexity of sequence tagging vs. classification

Architecture Specifications

Component Configuration Details
Backbone Type Transformer (GPT-style) Bidirectional, non-causal attention
Vocabulary Size 8,192 BPE tokenization
Embedding Dim 256 Token + Rotary Position embeddings
Attention Heads 8 Query, 4 KV Grouped Query Attention (GQA) for efficiency
Head Dimension 32 per head_dim = embed_dim / num_heads
Transformer Blocks 8 Layers Each with Attn + FFN + Residuals
Feed-Forward Hidden 672 SwiGLU gate activation
Position Encoding RoPE Rotary Position Embeddings (theta=10000)
Normalization RMSNorm Pre-layer normalization
Max Sequence Length 96 tokens Approximately 60-80 words
Dropout Rate 0.1 Applied during training
Total Parameters 7,949,626 All trainable
Parameter Breakdown Backbone: 7.80M, Task Heads: 146K Efficient multitask design

Task Configuration

Task Type Classes Architecture
Domain Classification Sequence-level 10 domains Pooled → Linear(256) → GELU → Linear(10)
Action Classification Sequence-level 33 actions Pooled → Linear(256) → GELU → Linear(33)
Slot Tagging Token-level 15 BIO labels Per-token → Linear(256) → Linear(15)

Benchmark Results

Runtime reliability
82-turn suite
82
Turns
67
Local
12
Clarify
0
Errors
fair_benchmarks.json
Predict latency
CUDA · batch=1 · lower is better
25.3ms
P·mean
34.6ms
P·p95
35.4ms
Fwd·mean
36.7ms
Fwd·p95
janus_model_report.json
OOD rejection quality
Schema-agnostic · hover values
87.8%
B77 F1
100%
B77 Prec
78.3%
B77 Rec
79.2%
CL F1
100%
CL Prec
65.6%
CL Rec
fair_benchmarks.json

Comprehensive Benchmark Summary

Full Benchmark Evidence
All values from real holdout evaluations — no synthetic or inflated numbers
Metric Detail Jane v2 Janus
Speed (mean latency) CUDA, batch=1 31.60 ms 25.31 ms
Throughput CUDA, single GPU 32 pred/sec Stable across 82 turns, 0 errors
OOD F1 BANKING77 94.31% 87.80%
OOD F1 CLINC OOS 89.16% 79.23%
OOD Precision BANKING77 99.35% 100.00%
OOD Precision CLINC OOS 99.14% 100.00%
OOD Recall BANKING77 89.75% 78.25%
OOD Recall CLINC OOS 81.00% 65.60%
Validation Accuracy Domain (best epoch) 99.83%
Validation Accuracy Action (best epoch) 99.87%
Validation Accuracy Domain+Action pair (best epoch) 99.83%
Slot Extraction F1 All 15 slot types 1.000 (100%)
Training Loss Epoch 1 → 4 0.060 → 0.020 → 0.002 → 0.001
Validation Loss Epoch 1 → 3 0.0153 → 0.0116 → 0.0115 (stable)
Runtime Reliability 82-turn conversation test 0 errors, 0 crashes
Domain Confusion 10 domains 99%+ per-domain, minimal cross-confusion
Action Confusion 33 actions Perfect diagonal, no action commonly confused

Live Output Shapes (click to expand)

Command output
{
  "type": "command",
  "domain": "apps",
  "action": "launch",
  "slots": {
    "APP_NAME": {
      "text": "chrome",
      "start": 5,
      "end": 11,
      "confidence": 0.999
    }
  },
  "confidence": 0.97,
  "route": "local"
}
Clarification output
{
  "type": "clarify",
  "question": "What value should I set it to?",
  "debug": {
    "domain": "volume",
    "action": "set",
    "reason": "missing_VALUE"
  }
}
Label schema
  • Domains (10): volume, brightness, media, apps, browser, productivity, screen, window, system, conversation
  • Actions (33): up, down, set, mute, unmute, play, pause, next, previous, launch, close, switch, search, set_reminder, screenshot, read, explain, undo, quit, chat, minimize, maximize, restore, focus, copy, paste, cut, lock, sleep, wifi_on, wifi_off, bluetooth_on, bluetooth_off
  • Slot labels (BIO, 15): VALUE, APP_NAME, QUERY, DURATION, TIME, WINDOW_NAME, TEXT

Visual Benchmark Evidence

Train and validation loss

Smoothed train loss

Validation slot F1

Confusion Matrix — Interactive Breakdown

Per-Class True vs Predicted
Single stacked bar per head — segment width = sample ratio. Hover any segment for details.
Domain Sample Distribution — 3,110 total samples — hover each segment
volume
430 samples (13.8%)
Accuracy: 100%
Misclassified: 0
brightness
250 samples (8.0%)
Accuracy: 100%
Misclassified: 0
media
340 samples (10.9%)
Accuracy: 100%
Misclassified: 0
apps
250 samples (8.0%)
Accuracy: 100%
Misclassified: 0
browser
120 samples (3.9%)
Accuracy: 100%
Misclassified: 0
productivity
340 samples (10.9%)
Accuracy: 100%
Misclassified: 0
screen
120 samples (3.9%)
Accuracy: 100%
Misclassified: 0
window
340 samples (10.9%)
Accuracy: 100%
Misclassified: 0
system
580 samples (18.6%)
Accuracy: 100%
Misclassified: 0
conversation
340 samples (10.9%)
Accuracy: 100%
Misclassified: 0
volume
brightness
media
apps
browser
productivity
screen
window
system
conversation
Source: validation set confusion matrix — segment widths proportional to sample count
Action Sample Distribution — 3,205 total samples — hover each segment
up
170 samples (5.3%)
Accuracy: 100%
Misclassified: 0
down
165 samples (5.1%)
Accuracy: 100%
Misclassified: 0
set
170 samples (5.3%)
Accuracy: 100%
Misclassified: 0
mute
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
unmute
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
play
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
pause
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
next
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
previous
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
launch
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
close
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
switch
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
search
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
set_reminder
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
screenshot
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
read
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
explain
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
undo
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
quit
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
chat
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
minimize
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
maximize
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
restore
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
focus
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
copy
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
paste
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
cut
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
lock
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
sleep
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
wifi_on
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
wifi_off
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
bluetooth_on
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
bluetooth_off
90 samples (2.8%)
Accuracy: 100%
Misclassified: 0
up
down
set
mute
unmute
play
pause
next
previous
launch
close
switch
search
set_reminder
screenshot
read
explain
undo
quit
chat
minimize
maximize
restore
focus
copy
paste
cut
lock
sleep
wifi_on
wifi_off
bluetooth_on
bluetooth_off
Source: validation set confusion matrix — segment widths proportional to sample count
View original confusion matrix images

Domain confusion matrix

Action confusion matrix

Additional diagnostics

Learning rate schedule

Epoch time profile

Raw training loss


Upload-Ready Layout

.
|- README.md
|- .gitattributes
|- LICENSE
|- requirements.txt
|- assets/
|  |- jane-janus-glitch.webp
|- janegpt_v2_janus/
|  |- __init__.py
|  |- architecture.py
|  |- dataset.py
|  |- inference.py
|  |- labels.py
|  |- multitask.py
|- runtime/
|  |- jane_nlu_runtime.py
|- examples/
|  |- demo_inference.py
|  |- demo_runtime.py
|  |- demo_runtime_suite.py
|- weights/
|  |- janegpt_v2_janus.pt
|  |- tokenizer.json
|- reports/
|  |- fair_benchmarks.json
|  |- fair_benchmarks.md
|  |- janus_model_report.json
|  |- janus_model_report.md
|  |- public_benchmarks.json
|  |- *.png benchmark visuals

Limitations

  • English-focused command language.
  • Command NLU model, not an open-domain generative chatbot.
  • MASSIVE and SNIPS mapped-intent accuracy is excluded from headline claims because mapping coverage is partial.

Use Cases

  • Virtual assistant command routing
  • Smart home intent classification
  • Voice command understanding
  • Chatbot intent detection
  • Edge device deployment (small enough for embedded systems)

Part of the JANE Project

JANE — a fully offline, privacy-first AI voice assistant.

🔗 JANE AI Assistant on GitHub 🔗 JaneGPT-v2 on GitHub


Created By

Ravindu Senanayake

Built from scratch — architecture, tokenizer, and training pipeline designed and implemented by the author.

GitHub


License

Apache-2.0 (see LICENSE).

Downloads last month
117
Safetensors
Model size
7.95M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results