File size: 9,336 Bytes
dffb352
 
 
 
 
 
 
882bc04
dffb352
 
 
 
 
 
 
 
 
 
 
 
 
 
 
882bc04
 
 
 
 
 
 
 
 
dffb352
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
882bc04
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dffb352
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
---
license: apache-2.0
library_name: ggml
pipeline_tag: depth-estimation
tags:
  - depth-anything
  - depth-anything-3
  - depth-anything-2
  - depth-estimation
  - monocular-depth
  - camera-pose
  - gguf
  - ggml
  - cpp
  - localai
base_model:
  - depth-anything/DA3-SMALL
  - depth-anything/DA3-BASE
  - depth-anything/DA3-LARGE
  - depth-anything/DA3-GIANT
  - depth-anything/DA3MONO-LARGE
  - depth-anything/DA3METRIC-LARGE
  - depth-anything/DA3NESTED-GIANT-LARGE
  - depth-anything/Depth-Anything-V2-Small
  - depth-anything/Depth-Anything-V2-Base
  - depth-anything/Depth-Anything-V2-Large
  - depth-anything/Depth-Anything-V2-Metric-Hypersim-Small
  - depth-anything/Depth-Anything-V2-Metric-Hypersim-Base
  - depth-anything/Depth-Anything-V2-Metric-Hypersim-Large
  - depth-anything/Depth-Anything-V2-Metric-VKITTI-Small
  - depth-anything/Depth-Anything-V2-Metric-VKITTI-Base
  - depth-anything/Depth-Anything-V2-Metric-VKITTI-Large
---

# Depth Anything 3 — GGUF weights for [depth-anything.cpp](https://github.com/mudler/depth-anything.cpp)

**Brought to you by the [LocalAI](https://github.com/mudler/LocalAI) team.**

GGUF conversions of [ByteDance Depth Anything 3](https://github.com/bytedance-seed/depth-anything-3),
for use with **[depth-anything.cpp](https://github.com/mudler/depth-anything.cpp)** — a from-scratch
C++17 / [ggml](https://github.com/ggml-org/ggml) port. No Python, no PyTorch, no CUDA toolkit at
inference: one self-contained GGUF file plus a small native library and CLI, **faster than PyTorch
on CPU** and **bit-exact** against the original (correlation 1.0, verified component by component).

Given an image, the engine recovers a dense **depth** map, per-pixel **confidence**, camera
**extrinsics (3×4)** and **intrinsics (3×3)**, an optional **sky** mask, a back-projected **3D point
cloud**, and exports to **glb / COLMAP / PLY**.

## Files in this repo

Each GGUF is fully self-contained — every dimension, hyperparameter and preprocessing constant is
baked into the file; the loader reads them, nothing is hardcoded.

| File | Source checkpoint | Backbone | Depth type | Output |
|------|-------------------|----------|-----------|--------|
| `depth-anything-small-f32.gguf` | `DA3-SMALL` | ViT-S | relative | depth + conf + pose |
| `depth-anything-base-f32.gguf` | `DA3-BASE` | ViT-B | relative | depth + conf + pose |
| `depth-anything-base-f16.gguf` | `DA3-BASE` | ViT-B | relative | depth + conf + pose |
| `depth-anything-base-q8_0.gguf` | `DA3-BASE` | ViT-B | relative | depth + conf + pose (near-lossless) |
| `depth-anything-base-q4_k.gguf` | `DA3-BASE` | ViT-B | relative | depth + conf + pose (**99 MB**) |
| `depth-anything-large-f32.gguf` | `DA3-LARGE` | ViT-L | relative | depth + conf + pose |
| `depth-anything-giant-f32.gguf` | `DA3-GIANT` | ViT-g | relative | depth + conf + pose + 3D Gaussians |
| `depth-anything-mono-large-f32.gguf` | `DA3MONO-LARGE` | ViT-L | relative (monocular) | depth + sky |
| `depth-anything-metric-large-f32.gguf` | `DA3METRIC-LARGE` | ViT-L | **metric** | metric depth + sky |
| `depth-anything-nested-anyview.gguf` | `DA3NESTED-GIANT-LARGE` (anyview branch) | ViT-g | relative | depth + conf + pose |
| `depth-anything-nested-metric.gguf` | `DA3NESTED-GIANT-LARGE` (metric branch) | ViT-L | **metric** | depth + sky |

> The nested model is a **two-file pair**: the engine loads the anyview (ViT-g) branch and the
> metric (ViT-L) branch together and aligns them to produce metric-scale depth + pose. Download
> both `depth-anything-nested-anyview.gguf` and `depth-anything-nested-metric.gguf`.

### Depth Anything V2

The same engine also runs [Depth Anything **V2**](https://github.com/DepthAnything/Depth-Anything-V2)
checkpoints. DA2 is **depth only** — no confidence, pose or sky. **Relative** models output an inverse
depth map through a `ReLU` head; **metric** models output depth in **metres** through a
`Sigmoid × max_depth` head (`max_depth=20` for the indoor Hypersim variants, `max_depth=80` for the
outdoor VKITTI variants). The ViT-g (Giant) DA2 checkpoint is not shipped (its `Depth-Anything-V2-Giant`
HF repo is gated/unreleased).

Each model below ships in f32 plus f16 / q8_0 / q6_k / q5_k / q4_k quants (only the f32 + a representative
quant are listed for brevity; the full set is in `SHA256SUMS`).

| File | Source checkpoint | Backbone | Depth type | Output |
|------|-------------------|----------|-----------|--------|
| `depth-anything2-small-f32.gguf` | `Depth-Anything-V2-Small` | ViT-S | relative | inverse depth |
| `depth-anything2-small-q8_0.gguf` | `Depth-Anything-V2-Small` | ViT-S | relative | inverse depth (near-lossless) |
| `depth-anything2-base-f32.gguf` | `Depth-Anything-V2-Base` | ViT-B | relative | inverse depth |
| `depth-anything2-large-f32.gguf` | `Depth-Anything-V2-Large` | ViT-L | relative | inverse depth |
| `depth-anything2-large-q4_k.gguf` | `Depth-Anything-V2-Large` | ViT-L | relative | inverse depth (smallest) |
| `depth-anything2-metric-hypersim-small-f32.gguf` | `Depth-Anything-V2-Metric-Hypersim-Small` | ViT-S | **metric** (≤20 m, indoor) | depth in metres |
| `depth-anything2-metric-hypersim-base-f32.gguf` | `Depth-Anything-V2-Metric-Hypersim-Base` | ViT-B | **metric** (≤20 m, indoor) | depth in metres |
| `depth-anything2-metric-hypersim-large-f32.gguf` | `Depth-Anything-V2-Metric-Hypersim-Large` | ViT-L | **metric** (≤20 m, indoor) | depth in metres |
| `depth-anything2-metric-vkitti-small-f32.gguf` | `Depth-Anything-V2-Metric-VKITTI-Small` | ViT-S | **metric** (≤80 m, outdoor) | depth in metres |
| `depth-anything2-metric-vkitti-base-f32.gguf` | `Depth-Anything-V2-Metric-VKITTI-Base` | ViT-B | **metric** (≤80 m, outdoor) | depth in metres |
| `depth-anything2-metric-vkitti-large-f32.gguf` | `Depth-Anything-V2-Metric-VKITTI-Large` | ViT-L | **metric** (≤80 m, outdoor) | depth in metres |

**Parity.** Every DA2 GGUF is verified against the upstream `DepthAnythingV2` forward (correlation > 0.999
end-to-end at f32, q8_0 near-lossless at corr 0.99962, q4_k at 0.99944). The one exception is
`depth-anything2-metric-vkitti-small` at corr **0.9983** — this is **not a porting defect** (the C++ route
matches the reference `Sigmoid × 80` math exactly); it is the inherent ≤20× amplification of backbone
fp-rounding noise by the widest metric scale on the smallest backbone. Absolute error stays sub-1%
(mean 0.57% of 80 m), and the same ViT-S backbone scores 0.9996 in relative mode. Accepted as near-lossless.

### Which one should I use?

- **Just trying it out / CPU:** `depth-anything-base-q4_k.gguf` (99 MB, near-lossless).
- **Best quality/speed default:** `depth-anything-base-q8_0.gguf`.
- **Smallest / fastest:** `depth-anything-small-f32.gguf`.
- **Highest quality + 3D reconstruction (point cloud / Gaussians):** `depth-anything-giant-f32.gguf`.
- **Single-image depth with sky mask:** `depth-anything-mono-large-f32.gguf`.
- **Metric-scale depth (meters), single model:** `depth-anything-metric-large-f32.gguf`.
- **Best metric-scale depth + pose:** the nested pair (`depth-anything-nested-anyview.gguf` +
  `depth-anything-nested-metric.gguf`).

## Usage

### depth-anything.cpp (CLI)

```bash
git clone https://github.com/mudler/depth-anything.cpp && cd depth-anything.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j

# download a weight from this repo
hf download mudler/depth-anything.cpp-gguf depth-anything-base-q4_k.gguf --local-dir models

./build/da3 depth models/depth-anything-base-q4_k.gguf image.jpg --out depth.png
./build/da3 depth models/depth-anything-base-q4_k.gguf image.jpg --pose poses.json
./build/da3 reconstruct models/depth-anything-giant-f32.gguf image.jpg --ply cloud.ply

# metric-scale depth from the single metric model
./build/da3 depth models/depth-anything-metric-large-f32.gguf image.jpg --out depth.png

# metric-scale depth + pose from the nested pair (anyview + metric branches)
./build/da3 depth models/depth-anything-nested-anyview.gguf image.jpg \
    --metric-model models/depth-anything-nested-metric.gguf --pfm depth.pfm
```

See the [README](https://github.com/mudler/depth-anything.cpp) for multi-view, glb/COLMAP export,
quantization and the flat C API.

### LocalAI

```bash
local-ai run depth-anything-3-base
```

## Performance

Faster than PyTorch on CPU at half the memory, bit-exact. AMD Ryzen 9 9950X3D, `threads=16`,
504×336, sustained:

| engine | quant | model MB | load ms | infer ms | peak RAM MB | vs PyTorch |
|--------|-------|---------:|--------:|---------:|------------:|-----------:|
| PyTorch | f32 | 516 | 749 | 416.9 | 1328 | 1.00× |
| **C++/ggml** | f32 | 393 | **112** | **346.4** | **614** | **1.20×** |
| **C++/ggml** | q8_0 | 142 | **40** | **319.4** | **363** | **1.31×** |
| **C++/ggml** | q4_k | **99** | **25** | 395.2 | **320** | 1.05× |

Full methodology in [`benchmarks/BENCHMARK.md`](https://github.com/mudler/depth-anything.cpp/blob/master/benchmarks/BENCHMARK.md).

## License

The GGUF weights are derived from the official Depth Anything 3 checkpoints and inherit their
**Apache-2.0** license. The depth-anything.cpp code is MIT.

## Citation

```bibtex
@article{depthanything3,
  title   = {Depth Anything 3: Recovering the Visual Space from Any Views},
  author  = {ByteDance Seed},
  year    = {2025}
}
```