Instructions to use peterk2023/ELF-B-alpaca-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use peterk2023/ELF-B-alpaca-mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("peterk2023/ELF-B-alpaca-mlx") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use peterk2023/ELF-B-alpaca-mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "peterk2023/ELF-B-alpaca-mlx" --prompt "Once upon a time"
ELF-B-alpaca-mlx
A single-turn, instruction-following fine-tune of embedded-language-flows/ELF-B-owt (105 M, continuous diffusion language model) on the Alpaca 52k dataset, trained entirely on an Apple Silicon GPU via the MLX backend in cjkao/ELF-mlx.
This is a feasibility demonstration, not a production assistant. The point is to show that a continuous-embedding diffusion LM can be steered toward instruction-following with a modest fine-tune on commodity hardware.
What's in this repo
| File | Description |
|---|---|
checkpoint_26000_final.safetensors |
Final EMA weights, MLX-flattened safetensors layout, ~418 MB. |
train_alpaca_ELF-B-mlx.yml |
The exact training config used to produce these weights. |
How to use
Clone the MLX backend repo, drop the checkpoint into a local directory, and run the chat REPL:
git clone https://github.com/cjkao/ELF-mlx
cd ELF-mlx
pip install -r requirements.txt
# Download the checkpoint
mkdir -p src/outputs/elf_b-alpaca-mlx
hf download peterk2023/ELF-B-alpaca-mlx checkpoint_26000_final.safetensors \
--local-dir src/outputs/elf_b-alpaca-mlx
# Chat
make chat CHECKPOINT=outputs/elf_b-alpaca-mlx
Or single-shot:
make chat CHECKPOINT=outputs/elf_b-alpaca-mlx \
PROMPT="Write a haiku about coffee."
The prompt format expected by the model is the Alpaca-style one used during training:
Instruction: {your instruction}
Response:
mlx_chat.py builds that prompt for you.
Training details
- Base model:
embedded-language-flows/ELF-B-owt(continuous flow-matching DLM, 105 M params, T5-small text encoder) - Dataset:
tatsu-lab/alpaca(~52k instruction/response pairs, single-turn, English) - Sequence layout:
max_length=256total (max_input_length=192prompt, 64 response slot) - Steps: 26 000 (13 000 initial run + 13 000 continuation)
- Batch size: 4
- Learning rate: 5e-4 (linear warmup 500 steps), AdamW
- EMA decay: 0.9999
- Decoder branch prob: 0.2 (both diffusion and decoder branches active)
- Self-cond prob: 0.5
- Label drop prob: 0.1 (enables classifier-free guidance at inference)
- Hardware: Apple Silicon GPU via MLX
Recommended inference settings (defaults in make chat):
num_steps=32,cfg_scale=2.0,sampling_method=sde,sde_gamma=1.5
Limitations
- Single-turn only. No dialogue history is conditioned on.
- Short outputs. Response slot capped at 64 tokens.
- Feasibility-grade quality. Outputs are topical and instruction-shaped but not aligned, not factually reliable, and not safety-tuned.
- No RLHF, no DPO. Plain supervised fine-tune on Alpaca.
- MLX layout. Weights are flattened MLX safetensors; loading them outside the
ELF-mlxrepo requires reusingmlx_backend/safetensors_load.py.
License
MIT, matching the base model and the ELF-mlx repo.
Citation
If you use the base architecture, please cite the original ELF paper:
@article{elf2025,
title={ELF: Embedded Language Flows},
journal={arXiv preprint arXiv:2605.10938},
year={2025}
}
Quantized
Model tree for peterk2023/ELF-B-alpaca-mlx
Base model
embedded-language-flows/ELF-B-owt