PIIBench Source-Conditioned Hierarchical DeBERTa

This is the source-conditioned hierarchical comparison model trained for the follow-up PIIBench experiments. It uses a DeBERTa-v3-base encoder, a coarse entity classification head, and a fine BIO classification head conditioned on the coarse distribution.

The simpler directly fine-tuned model was the final overall winner on the full held-out experiment test split and is published separately as Pritesh-2711/piibench-deberta-base.

Paper

This model is released with the paper:

Fine-Tuning Over Architectural Complexity: Broad-Coverage PII Detection on PIIBench with DeBERTa
arXiv: https://arxiv.org/abs/2605.25816
Hugging Face Papers: https://huggingface.co/papers/2605.25816

This repository corresponds to the source-conditioned hierarchical DeBERTa comparison model evaluated in the paper.

Results

The reported evaluation uses the later prepared PIIBench experiment variant with 82 retained entity types and a held-out test split of 100,002 records. It is not the earlier 48-type Hub dataset release.

Held-Out Evaluation Records F1 Precision Recall
Corrected heldout subset 5,000 0.5899 0.5565 0.6274
Complete experiment test split 100,002 0.5894 0.5560 0.6270

Full-test SHA-256: 65f8edc86399ba3f9e4ba44591d4583f9271f5d1df20e30a913305049559df77

Usage

This model includes custom architecture code. Load it with trust_remote_code=True.

It was trained with a prepended source token. For arbitrary input where the source dataset is unknown, use the general source token:

from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline

model_id = "Pritesh-2711/piibench-deberta-sch"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(
    model_id,
    trust_remote_code=True,
)

pipe = pipeline("token-classification", model=model, tokenizer=tokenizer)
result = pipe("[SRC=general] Contact me at jane@example.com.")
print(result)

Transformers may print an informational warning that custom model classes are not in its built-in token-classification support list. The model is loaded correctly when its class is HierarchicalPIIModel; the warning does not mean that a standard DeBERTa classifier head has been substituted.

When evaluating known PIIBench source records, use their associated source token, for example [SRC=nvidia_nemotron] or [SRC=gretel_finance].

Important Note

Calling:

pipeline("token-classification", model="Pritesh-2711/piibench-deberta-sch")

without trust_remote_code=True does not instantiate the hierarchical head and must not be used to reproduce the reported results.

Related Resources

Downloads last month
7
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pritesh-2711/piibench-deberta-sch

Finetuned
(643)
this model

Dataset used to train Pritesh-2711/piibench-deberta-sch

Paper for Pritesh-2711/piibench-deberta-sch