AIGENCY V4

Sovereign, fully independent, multimodal — 128B parameters. A globally competitive Turkish-first AI model: world-leading on Turkish reading comprehension and natural-language inference, frontier-level on grade-school math and scientific reasoning, KVKK-resident.

🇹🇷 Türkçe README · 🇬🇧 English README · 📄 Whitepaper (EN) · 📄 Whitepaper (TR) · 🌐 Try the demo · 🔗 API

English

Model summary

AIGENCY V4 is the multimodal successor to AIGENCY V3, developed by eCloud Yazılım Teknolojileri and released to production in Q2 2026. The model retains V3's four sovereignty principles — zero external parameter dependency, sovereign data residency, transparent architectural documentation, and Turkish morphological context fidelity — and adds a sovereign 8B-parameter vision encoder for image, document, chart, and visual-math understanding.


Total parameters	128B (120B core + 8B vision encoder)
Architecture	Sovereign decoder-only transformer + side vision encoder
Optimisations	Adaptive LoRA+, Selective Layer Collapse, Localised MoE, 4-bit block quantization, chunked attention
Context window	278K tokens (HBM 3-tier: STM 4k / ITM 64k / LTM 278k)
Active inference memory	~6.5 GB GPU under 4-bit quant
Languages	Turkish (primary), English
Modalities	Text, image (one image per request, 30 MB max, image/* MIME)
Release version	1.0 production
Release date	April 2026
Licence	API-only commercial — see https://aigency.dev/license

Distribution

Weights are not distributed. AIGENCY V4 is accessed exclusively through the eCloud production API at https://aigency.dev/api/v2. This page provides the architectural specification, the evaluation methodology, and the full benchmark results. To try the model interactively, use the demo Space. For production access, see aigency.dev.

Evaluation

A comprehensive single-session evaluation was conducted on 27 April 2026 against the production API. 13,344 real API calls across 22 distinct benchmarks were executed; every result is reported with a Wilson 95% confidence interval, deterministic subsampling (seed=42), and an open dataset identifier.

Tier 1 — Critical benchmarks (full set)

Benchmark	Accuracy	Wilson 95% CI	n	Errors
HumanEval (pass@1)	0.8415	[0.778, 0.889]	164/164	0
IFEval (strict)	0.8022	[0.767, 0.834]	541/541	1
GPQA Diamond	0.3788	[0.314, 0.448]	198/198	0
Belebele-TR	0.8733	[0.850, 0.893]	900/900	0
ARC-Challenge	0.9488	[0.935, 0.960]	1172/1172	0
TruthfulQA MC1	0.7638	[0.734, 0.792]	817/817	0
GSM8K	0.9462	[0.933, 0.957]	1319/1319	0

Tier 2 — Mid-volume

Benchmark	Accuracy	Wilson 95% CI	n
MMLU (stratified)	0.8010	[0.775, 0.825]	1000/1000
MMLU-Pro	0.5020	[0.471, 0.533]	1000/1000
HellaSwag	0.8860	[0.865, 0.904]	1000/1000
WinoGrande XL	0.7466	[0.722, 0.770]	1267/1267
HumanEval+ (extended)	0.7988	[0.731, 0.853]	164/164
MBPP (sanitized)	0.8482	[0.799, 0.887]	257/257
MBPP+	0.7804	[0.736, 0.819]	378/378

Tier 3-A — Turkish (V4 is the de-facto global reference)

Benchmark	Accuracy	Wilson 95% CI	n
Belebele-TR	0.8733	[0.850, 0.893]	900/900
TQuAD (F1 ≥ 0.5)	0.8240	[0.788, 0.855]	500/500
TR-MMLU	0.7080	[0.667, 0.746]	500/500
XNLI-TR	0.7340	[0.694, 0.771]	500/500
TR Grammar (synthetic)	0.7900	[0.700, 0.858]	100/100

Frontier models do not consistently publish Turkish-specific scores. Within published global evaluation, AIGENCY V4 is the Turkish reference.

Tier 3-B — Multimodal (first production release)

Benchmark	Accuracy	Wilson 95% CI	n
MMMU (val)	0.5333	[0.361, 0.698]	30/30
ChartQA (relaxed)	0.6768	[0.634, 0.717]	492/500
DocVQA (ANLS ≥ 0.5)	0.7917	[0.595, 0.908]	24
MathVista (testmini)	0.3413	[0.280, 0.408]	208

Comparison with frontier (April 2026)

Benchmark	AIGENCY V4	GPT-5	Claude 4.6/4.7	Gemini 3 Pro
GSM8K	94.62	96.8	~96	~94
ARC-Challenge	94.88	~96	~96	~95
HumanEval	84.15	94.0	95.0	89.7
MMLU	80.10	94.2	88-93	92.4
MMLU-Pro	50.20	~85	~84	~81
GPQA Diamond	37.88	88-94	91.3-94.2	91.9
MMMU	53.33	79.1	84.1	—

V4 is at frontier level on grade-school math and scientific reasoning, upper-mid frontier on code generation, lower-mid frontier on general academic and instruction following, and in active development on graduate-level expert knowledge and multimodal. The V4.1 roadmap (Q4 2026) targets MMLU-Pro 0.65, GPQA Diamond 0.55, and average latency 4 s.

Operational performance (single-session, 27 April 2026)

Total API calls: 13,344
Persistent error rate: 0.3%
Average latency: 9.55 s · p50 4.39 s · p95 32.77 s · p99 33.59 s
V4.1 latency target: average ≤ 4 s · p95 ≤ 15 s

Reproducibility

Full evaluation harness, raw responses, scored items, summary JSON, and the deterministic subsample seed are available at:

Benchmark code: https://github.com/ecloud-bh/aigency-benchmarks
Evaluation results dataset: https://huggingface.co/datasets/aigencydev/aigency-v4-evaluation
Whitepaper (EN/TR): https://github.com/ecloud-bh/aigency-v4-whitepaper

Intended use

Primary deployment domains:

Public-sector and government workloads requiring KVKK residency
Legal and legal-tech (statute search, contract analysis — Tural model integration)
Education and higher education (Turkish academic, exam prep, course assistants)
Banking, finance and insurance (Turkish-heavy KYC/AML)
Healthcare administrative workloads (KVKK-compliant document handling)
Media, publishing and editorial (Turkish grammar precision)
Defence and critical infrastructure (sovereign architecture)
Software, R&D and engineering (code generation, large-codebase analysis)

Out-of-scope or non-recommended:

Clinical diagnosis or medical advice (administrative use only)
Autonomous critical decisions without human review
Graduate-level scientific research where GPQA-Diamond–class accuracy is required (use frontier model + V4 hybrid)
High-fidelity multimodal reasoning where MMMU > 75 is required (await V4.1)

Safety and compliance

KVKK §5 / §12 (Turkish PDPA) compliant — KVKK-resident hosting (TR DC)
ISO/IEC 27001 — IT-ISMS, risk and control matrix
NIST SP 800-207 (Zero-Trust) — mTLS, least privilege, continuous monitoring
EU AI Act (ratified 2025) — high-risk classification with model card
Memory encryption: AES-256-XTS (RAM), ChaCha20-Poly1305 (LTM disk)
Image cache: AES-256-GCM, 30 MB limit, 24h TTL
Pre-encoding visual safety filter + post-encoding output check

Known limitations

GPQA Diamond / MMLU-Pro gap — 35-50pp behind frontier; graduate-level expert knowledge is a V4.1 target.
First-generation multimodal — vision encoder is 8B; V4.1 plans to scale to 16B.
Latency 2-3× frontier — vision-encoder overhead, multimodal safety filter; V4.1 targets ≤ 4 s avg.
Multimodal subsample size — DocVQA n=24, MMMU n=30 (HF cache constraints); CIs are wide.
Multilingual non-TR evaluation not published — global-scale claim is currently Turkish-anchored.

Citation

@techreport{aigency-v4-2026,
  title  = {AIGENCY V4: Sovereign, Fully Independent and Multimodal 128B-Parameter AI Architecture},
  author = {{eCloud Yaz{\i}l{\i}m Teknolojileri}},
  year   = {2026},
  month  = apr,
  institution = {eCloud Yaz{\i}l{\i}m Teknolojileri},
  url    = {https://github.com/ecloud-bh/aigency-v4-whitepaper},
  note   = {Whitepaper v1.0, April 2026}
}

Türkçe

Model özeti

AIGENCY V4, eCloud Yazılım Teknolojileri tarafından geliştirilen, V3'ün multimodal halefi olan 128 milyar parametreli yerli yapay zekâ modelidir. 2026/Q2'de üretime alındı. V3'ün dört bağımsızlık ilkesini (dış parametre sıfırlama, yerel veri egemenliği, şeffaf belgeleme, Türkçe bağlam uyumu) korur ve görsel anlama, belge soru-cevap, grafik yorumlama, görsel matematik yetkinliklerini ekleyen 8B parametreli yerli vision encoder ile genişletir.


Toplam parametre	128B (120B çekirdek + 8B vision encoder)
Mimari	Yerli decoder-only transformer + yan vision encoder
Optimizasyonlar	Adaptif LoRA+, Selective Layer Collapse, L-MoE, 4-bit blok kuantizasyon, öbekli dikkat
Bağlam penceresi	278K token (HBM 3-katmanlı: STM 4k / ITM 64k / LTM 278k)
Aktif inferans bellek	4-bit kuantizasyon altında ~6.5 GB GPU
Diller	Türkçe (birincil), İngilizce
Modaliteler	Metin, görsel (istek başına bir görsel, max 30 MB, image/* MIME)
Sürüm	1.0 üretim
Yayın tarihi	Nisan 2026
Lisans	API-only ticari — https://aigency.dev/license

Dağıtım

Ağırlıklar HuggingFace'de paylaşılmaz. AIGENCY V4'e erişim yalnızca https://aigency.dev/api/v2 üzerinden sağlanır. Bu sayfa mimari spesifikasyonu, değerlendirme metodolojisini ve tam benchmark sonuçlarını sunar. Modeli interaktif olarak denemek için demo Space sayfasını kullanın. Üretim erişimi için: aigency.dev.

Konumlandırma — Tek cümlede

AIGENCY V4, Türkçe okuma anlama ve doğal dil çıkarımında dünya lideri, fen muhakemesi ve grade-school matematikte küresel frontier seviyesinde, kod üretiminde üst-orta frontier segmentinde, multimodal ve graduate-level uzman bilgide aktif geliştirme aşamasında, tam-bağımsız ve KVKK-yerel bir yerli yapay zekâ modelidir.

Hedef kullanım alanları

Kamu sektörü ve devlet kurumları (KVKK gereksinimi)
Hukuk ve hukuk teknolojileri (mevzuat arama, sözleşme analizi)
Eğitim ve yükseköğretim (Türkçe akademik, sınav hazırlık)
Bankacılık, finans ve sigorta (Türkçe-yoğun KYC/AML)
Sağlık idari iş yükleri (KVKK uyumlu belge işleme)
Medya, yayıncılık ve editoryal (Türkçe dilbilgisi titizliği)
Savunma ve kritik altyapı (egemen mimari)
Yazılım, AR-GE ve mühendislik

Bilinen kısıtlar

GPQA Diamond / MMLU-Pro frontier'ın 35-50pp gerisinde — V4.1 hedefi.
Multimodal ilk üretim sürümü — V4.1'de 16B vision encoder planlandı.
Latency frontier'ın 2-3 katı — V4.1 hedefi ≤ 4 s ortalama.
Multimodal subsample boyutu küçük (DocVQA n=24, MMMU n=30); CI geniş.
TR-dışı çok-dilli profil yayımlanmadı — küresel iddia şu an TR-merkezli.

Atıf

@techreport{aigency-v4-2026,
  title  = {AIGENCY V4: Yerli, Tam Ba{\u g}{\i}ms{\i}z ve Multimodal 128B Parametreli Yapay Zek\^a Mimarisi},
  author = {{eCloud Yaz{\i}l{\i}m Teknolojileri}},
  year   = {2026},
  month  = apr,
  institution = {eCloud Yaz{\i}l{\i}m Teknolojileri},
  url    = {https://github.com/ecloud-bh/aigency-v4-whitepaper}
}

License

AIGENCY V4 is offered under the eCloud AIGENCY Commercial Licence (API-only). Model weights are not redistributed. The accompanying whitepaper is licensed under CC BY-ND 4.0, and the benchmark code is licensed under MIT.

For commercial use, partnership, or research collaboration: info@e-cloud.web.tr · ai@aigency.dev · https://aigency.dev

Downloads last month: -; Downloads are not tracked for this model. How to track

Space using aigencydev/AIGENCY-V4 1

Evaluation results

pass@1 on HumanEval (pass@1)
self-reported

84.150
pass@1 on HumanEval+ (pass@1)
self-reported

79.880
pass@1 on MBPP (sanitized)
self-reported

84.820
pass@1 on MBPP+
self-reported

78.040
accuracy on GSM8K
self-reported

94.620
accuracy on MMLU (stratified n=1000)
self-reported

80.100
accuracy on MMLU-Pro (n=1000)
self-reported

50.200
accuracy on ARC-Challenge
self-reported

94.880
accuracy on GPQA Diamond
self-reported

37.880
accuracy on TruthfulQA MC1
self-reported

76.380
strict-prompt-level on IFEval (strict)
self-reported

80.220
accuracy on HellaSwag (n=1000)
self-reported

88.600
accuracy on WinoGrande XL
self-reported

74.660
accuracy on Belebele-TR (Turkish)
self-reported

87.330
F1 ≥ 0.5 on TQuAD (F1 ≥ 0.5)
self-reported

82.400
accuracy on TR-MMLU
self-reported

70.800