ELF-B-alpaca-mlx

A single-turn, instruction-following fine-tune of embedded-language-flows/ELF-B-owt (105 M, continuous diffusion language model) on the Alpaca 52k dataset, trained entirely on an Apple Silicon GPU via the MLX backend in cjkao/ELF-mlx.

This is a feasibility demonstration, not a production assistant. The point is to show that a continuous-embedding diffusion LM can be steered toward instruction-following with a modest fine-tune on commodity hardware.

What's in this repo

File	Description
`checkpoint_26000_final.safetensors`	Final EMA weights, MLX-flattened safetensors layout, ~418 MB.
`train_alpaca_ELF-B-mlx.yml`	The exact training config used to produce these weights.

How to use

Clone the MLX backend repo, drop the checkpoint into a local directory, and run the chat REPL:

git clone https://github.com/cjkao/ELF-mlx
cd ELF-mlx
pip install -r requirements.txt

# Download the checkpoint
mkdir -p src/outputs/elf_b-alpaca-mlx
hf download peterk2023/ELF-B-alpaca-mlx checkpoint_26000_final.safetensors \
  --local-dir src/outputs/elf_b-alpaca-mlx

# Chat
make chat CHECKPOINT=outputs/elf_b-alpaca-mlx

Or single-shot:

make chat CHECKPOINT=outputs/elf_b-alpaca-mlx \
  PROMPT="Write a haiku about coffee."

The prompt format expected by the model is the Alpaca-style one used during training:

Instruction: {your instruction}
Response:

mlx_chat.py builds that prompt for you.

Training details

Base model: embedded-language-flows/ELF-B-owt (continuous flow-matching DLM, 105 M params, T5-small text encoder)
Dataset: tatsu-lab/alpaca (~52k instruction/response pairs, single-turn, English)
Sequence layout: max_length=256 total (max_input_length=192 prompt, 64 response slot)
Steps: 26 000 (13 000 initial run + 13 000 continuation)
Batch size: 4
Learning rate: 5e-4 (linear warmup 500 steps), AdamW
EMA decay: 0.9999
Decoder branch prob: 0.2 (both diffusion and decoder branches active)
Self-cond prob: 0.5
Label drop prob: 0.1 (enables classifier-free guidance at inference)
Hardware: Apple Silicon GPU via MLX

Recommended inference settings (defaults in make chat):

num_steps=32, cfg_scale=2.0, sampling_method=sde, sde_gamma=1.5

Limitations

Single-turn only. No dialogue history is conditioned on.
Short outputs. Response slot capped at 64 tokens.
Feasibility-grade quality. Outputs are topical and instruction-shaped but not aligned, not factually reliable, and not safety-tuned.
No RLHF, no DPO. Plain supervised fine-tune on Alpaca.
MLX layout. Weights are flattened MLX safetensors; loading them outside the ELF-mlx repo requires reusing mlx_backend/safetensors_load.py.

License

MIT, matching the base model and the ELF-mlx repo.

Citation

If you use the base architecture, please cite the original ELF paper:

@article{elf2025,
  title={ELF: Embedded Language Flows},
  journal={arXiv preprint arXiv:2605.10938},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for peterk2023/ELF-B-alpaca-mlx

Base model

embedded-language-flows/ELF-B-owt

Finetuned

(1)

this model

Dataset used to train peterk2023/ELF-B-alpaca-mlx

Paper for peterk2023/ELF-B-alpaca-mlx

ELF: Embedded Language Flows

Paper • 2605.10938 • Published 14 days ago • 14