townboy/kpfbert-kdpii

Korean PII token-classification model fine-tuned from KPF/KPF-bert-ner on a KDPII-style dialogue dataset.

Dataset

  • Source file: ์—ฐ๋Œ€1_PII_dataset_V3.json
  • Documents: 4981
  • Sentences: 53778
  • Positive PII sentences: 19037
  • Label count: 33

Training Setup

  • Max length: 128
  • Epochs: 4.0
  • Learning rate: 2e-05
  • Train batch size: 8
  • Eval batch size: 8
  • Device: cuda
  • GPU: NVIDIA GeForce RTX 4060 Ti
  • Mixed precision: auto
  • Gradient checkpointing: True

Intended Use

This model is intended for Korean personally identifiable information detection in dialogue-like text. Typical labels include names, nicknames, account numbers, mobile numbers, emails, addresses, IDs, and related sensitive entities.

Quick Inference

from transformers import pipeline

pipe = pipeline(
    "token-classification",
    model="townboy/kpfbert-kdpii",
    aggregation_strategy="simple",
)

print(pipe("Phone 010-8661-5573, ID wanderingrabbit1"))

Notes

  • The classification head is reinitialized for the KDPII label space.
  • This checkpoint should be validated on your target product traffic before production use.
Downloads last month
30
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for townboy/kpfbert-kdpii

Base model

KPF/KPF-bert-ner
Finetuned
(1)
this model