D2LLM: Decomposed and Distilled Large Language Models for Semantic Search Paper • 2406.17262 • Published Jun 25, 2024 • 6
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning Paper • 2409.06679 • Published Sep 10, 2024 • 5
SOLAR: Self-supervised Joint Learning for Symmetric Multimodal Retrieval Paper • 2605.15868 • Published 21 days ago • 9
Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning Paper • 2605.30039 • Published 7 days ago • 17
ML-Embed: Inclusive and Efficient Embeddings for a Multilingual World Paper • 2605.15081 • Published 22 days ago • 11
Beyond Retrieval: A Multitask Benchmark and Model for Code Search Paper • 2605.04615 • Published about 1 month ago • 23
TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale Paper • 2604.21889 • Published Apr 23 • 12
QuitoBench: A High-Quality Open Time Series Forecasting Benchmark Paper • 2603.26017 • Published Mar 27 • 31
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World Paper • 2603.19223 • Published Mar 19 • 34
C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling Paper • 2512.21332 • Published Dec 24, 2025 • 17
F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data Paper • 2510.02294 • Published Oct 2, 2025 • 48
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions Paper • 2410.06577 • Published Oct 9, 2024 • 14
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM Paper • 2503.17793 • Published Mar 22, 2025 • 24
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks Paper • 2505.16901 • Published May 22, 2025 • 48