ELF-B-alpaca-mlx

A single-turn, instruction-following fine-tune of embedded-language-flows/ELF-B-owt (105 M, continuous diffusion language model) on the Alpaca 52k dataset, trained entirely on an Apple Silicon GPU via the MLX backend in cjkao/ELF-mlx.

This is a feasibility demonstration, not a production assistant. The point is to show that a continuous-embedding diffusion LM can be steered toward instruction-following with a modest fine-tune on commodity hardware.

What's in this repo

File Description
checkpoint_26000_final.safetensors Final EMA weights, MLX-flattened safetensors layout, ~418 MB.
train_alpaca_ELF-B-mlx.yml The exact training config used to produce these weights.

How to use

Clone the MLX backend repo, drop the checkpoint into a local directory, and run the chat REPL:

git clone https://github.com/cjkao/ELF-mlx
cd ELF-mlx
pip install -r requirements.txt

# Download the checkpoint
mkdir -p src/outputs/elf_b-alpaca-mlx
hf download peterk2023/ELF-B-alpaca-mlx checkpoint_26000_final.safetensors \
  --local-dir src/outputs/elf_b-alpaca-mlx

# Chat
make chat CHECKPOINT=outputs/elf_b-alpaca-mlx

Or single-shot:

make chat CHECKPOINT=outputs/elf_b-alpaca-mlx \
  PROMPT="Write a haiku about coffee."

The prompt format expected by the model is the Alpaca-style one used during training:

Instruction: {your instruction}
Response:

mlx_chat.py builds that prompt for you.

Training details

  • Base model: embedded-language-flows/ELF-B-owt (continuous flow-matching DLM, 105 M params, T5-small text encoder)
  • Dataset: tatsu-lab/alpaca (~52k instruction/response pairs, single-turn, English)
  • Sequence layout: max_length=256 total (max_input_length=192 prompt, 64 response slot)
  • Steps: 26 000 (13 000 initial run + 13 000 continuation)
  • Batch size: 4
  • Learning rate: 5e-4 (linear warmup 500 steps), AdamW
  • EMA decay: 0.9999
  • Decoder branch prob: 0.2 (both diffusion and decoder branches active)
  • Self-cond prob: 0.5
  • Label drop prob: 0.1 (enables classifier-free guidance at inference)
  • Hardware: Apple Silicon GPU via MLX

Recommended inference settings (defaults in make chat):

  • num_steps=32, cfg_scale=2.0, sampling_method=sde, sde_gamma=1.5

Limitations

  • Single-turn only. No dialogue history is conditioned on.
  • Short outputs. Response slot capped at 64 tokens.
  • Feasibility-grade quality. Outputs are topical and instruction-shaped but not aligned, not factually reliable, and not safety-tuned.
  • No RLHF, no DPO. Plain supervised fine-tune on Alpaca.
  • MLX layout. Weights are flattened MLX safetensors; loading them outside the ELF-mlx repo requires reusing mlx_backend/safetensors_load.py.

License

MIT, matching the base model and the ELF-mlx repo.

Citation

If you use the base architecture, please cite the original ELF paper:

@article{elf2025,
  title={ELF: Embedded Language Flows},
  journal={arXiv preprint arXiv:2605.10938},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for peterk2023/ELF-B-alpaca-mlx

Finetuned
(1)
this model

Dataset used to train peterk2023/ELF-B-alpaca-mlx

Paper for peterk2023/ELF-B-alpaca-mlx