| --- |
| language: en |
| license: other |
| tags: |
| - finance |
| - risk-relation |
| - retrieval |
| - encoder |
| - feature-extraction |
| - stock-prediction |
| pipeline_tag: feature-extraction |
| --- |
| |
| # Financial Risk Identification through Dual-view Adaptation — Encoder |
|
|
| This repository hosts the pretrained encoder from the work **“Financial Risk Identification through Dual-view Adaptation.”** |
| The model is designed to uncover **inter-firm risk relations** from financial text, supporting downstream tasks such as **retrieval**, **relation mining**, and **stock-signal experiments** where relation strength acts as a feature. |
|
|
| > **Files** |
| > - `pytorch_model.safetensors` — model weights |
| > - `config.json` — model configuration |
| > - `README.md` (this file) |
| |
| --- |
| |
| ## ✨ What’s special (Dual-view Adaptation) |
| |
| The model aligns two complementary “views” of firm relations and adapts them during training: |
| |
| - **Lexical view (`lex`)** — focuses on token/phrase-level and domain terms common in 10-K and financial news. |
| - **Temporal view (`time`)** — encourages stability/consistency of relations across reporting periods and evolving events. |
| |
| A **two-view combination (“Best”)** integrates both signals and yields stronger retrieval quality and more stable risk-relation estimates. Ablations (`lex`, `time`) are also supported for analysis. |
| |
| --- |
| |
| ## 🔧 Intended Use |
| |
| - **Feature extraction / sentence embeddings** for paragraphs, sections, or documents in financial filings. |
| - **Retrieval & ranking**: compute similarities between queries (e.g., “supply chain risk for X”) and candidate passages. |
| - **Risk-relation estimation**: aggregate cross-document similarities to produce pairwise firm relation scores used in downstream analytics. |
| |
| > ⚠️ Not a generative LLM. Use it as an **encoder** (feature extractor). |
| |
| --- |
| |
| ## 🚀 Quickstart (Transformers) |
| |
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoModel |
| |
| MODEL_ID = "william0816/Dual_View_Financial_Encoder" |
| |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True) |
| model = AutoModel.from_pretrained(MODEL_ID) |
|
|
| def mean_pool(last_hidden_state, attention_mask): |
| # Mean-pool w.r.t. the attention mask |
| mask = attention_mask.unsqueeze(-1).type_as(last_hidden_state) |
| summed = (last_hidden_state * mask).sum(dim=1) |
| counts = torch.clamp(mask.sum(dim=1), min=1e-9) |
| return summed / counts |
| |
| texts = [ |
| "The company faces supplier concentration risk due to a single-source vendor.", |
| "Management reported foreign exchange exposure impacting Q4 margins." |
| ] |
| |
| enc = tokenizer(texts, padding=True, truncation=True, return_tensors="pt") |
| with torch.no_grad(): |
| outputs = model(**enc) |
| embeddings = mean_pool(outputs.last_hidden_state, enc["attention_mask"]) |
| |
| # Cosine similarity for retrieval |
| emb_norm = torch.nn.functional.normalize(embeddings, p=2, dim=1) |
| similarity = emb_norm @ emb_norm.T |
| print(similarity) |
| ``` |
| |
| ## 🖇️ Citation |
| If you use this model or the dual-view methodology, please cite: |
| ```bibtex |
| @misc{financial_risk_dualview_2025, |
| title = {Financial Risk Identification through Dual-view Adaptation}, |
| author = {Chiu, Wei-Ning and collaborators}, |
| year = {2025}, |
| note = {Preprint/Project}, |
| howpublished = {\url{https://huggingface.co/william0816/Dual_View_Financial_Encoder}} |
| } |
| |