Lite-BERT-SL: Sequence Labeling for HiFi-KPI Lite
Lite-BERT-SL is a BERT-based sequence labeling model fine-tuned on the HiFi-KPI Lite dataset. This model was introduced in the paper HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings.
Model Description
The model is designed for the hierarchical extraction of Key Performance Indicators (KPIs) from financial earnings filings (SEC 10-K and 10-Q reports). While the full HiFi-KPI dataset contains a massive taxonomy of iXBRL tags, Lite-BERT-SL is fine-tuned on a manually curated subset focusing on four expert-mapped KPI clusters:
Revenues
Earnings
EPS (Earnings Per Share)
EBIT (Earnings Before Interest and Taxes)
Developed by: Rasmus Aavang, Giovanni Rizzi, Rasmus Bøggild, Alexandre Iolov, Mike Zhang, Johannes Bjerva
Model type: Token Classification (Sequence Labeling)
Base Model:
bert-base-uncasedLanguage: English
Use Cases
- Identifying and extracting generalized financial KPIs from earnings filings.
- Automating the parsing of SEC 10-K and 10-Q reports for structured data extraction.
- Assisting in the alignment of financial text with iXBRL taxonomies.
Performance
According to the paper, encoder-based models achieve over 0.906 macro-F1 on the HiFi-KPI Lite classification task. For detailed performance metrics, please refer to the paper and the HiFi-KPI Lite dataset page.
Dataset & Code
- Paper: HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings
- Dataset: HiFi-KPI Lite on Hugging Face
- Code: Official HiFi-KPI GitHub Repository
Citation
If you use this model or the dataset in your research, please cite:
@article{aavang2025hifikpi,
title={HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings},
author={Aavang, Rasmus and Rizzi, Giovanni and B{\o}ggild, Rasmus and Iolov, Alexandre and Zhang, Mike and Bjerva, Johannes},
journal={arXiv preprint arXiv:2502.15411},
year={2025}
}
- Downloads last month
- 26
Model tree for AAU-NLP/Lite-BERT-SL
Base model
google-bert/bert-base-uncased