Instructions to use lgy0404/MemGUI-8B-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lgy0404/MemGUI-8B-SFT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="lgy0404/MemGUI-8B-SFT") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("lgy0404/MemGUI-8B-SFT") model = AutoModelForMultimodalLM.from_pretrained("lgy0404/MemGUI-8B-SFT") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use lgy0404/MemGUI-8B-SFT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lgy0404/MemGUI-8B-SFT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lgy0404/MemGUI-8B-SFT", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/lgy0404/MemGUI-8B-SFT
- SGLang
How to use lgy0404/MemGUI-8B-SFT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lgy0404/MemGUI-8B-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lgy0404/MemGUI-8B-SFT", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lgy0404/MemGUI-8B-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lgy0404/MemGUI-8B-SFT", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use lgy0404/MemGUI-8B-SFT with Docker Model Runner:
docker model run hf.co/lgy0404/MemGUI-8B-SFT
MemGUI-8B-SFT
MemGUI-8B-SFT is an 8B MemGUI-Agent model trained from Qwen3-VL-8B-Instruct on MemGUI-3K. It is designed for long-horizon mobile GUI control with proactive context management.
The model follows the ConAct Context-as-Action protocol. At each step, it produces a structured response with reasoning, history folding, a UI or memory tool call, a grounded UI observation, and the next action intent. This allows the agent to manage three context fields while acting: Folded Action History, Folded UI State, and Recent Step Record.
Model Details
- Model type: multimodal mobile GUI agent
- Base model:
Qwen/Qwen3-VL-8B-Instruct - Training data:
lgy0404/MemGUI-3K - Training recipe: supervised fine-tuning with ms-swift
- Output protocol: ConAct 5-part structured output
- License: Apache 2.0
Intended Use
MemGUI-8B-SFT is intended for research on mobile GUI agents, long-horizon GUI control, context management, UI memory, and history folding. It can be used as an action policy in mobile GUI environments that provide screenshots and execute structured tool calls.
This model is not a general-purpose chatbot. It expects the MemGUI-Agent system prompt, a screenshot, and a structured mobile GUI context state.
Input and Output Format
The model expects a multimodal conversation with:
- a system prompt defining the MemGUI-Agent tools and response format,
- a user message containing
<image>plus the task goal and structured context, - one screenshot image.
The assistant response follows this order:
<thinking>...</thinking>
<folding>{"range": [start_step, current_step], "summary": "..."}</folding>
<tool_call>{"name": "mobile_use", "arguments": {...}}</tool_call>
<ui_observation>...</ui_observation>
<action_intent>...</action_intent>
For the first step of a trajectory, <folding> is omitted because there is no
previous step to fold.
Evaluation
| Benchmark | Metric | Score |
|---|---|---|
| MemGUI-Bench | Pass@1 | 23.4 |
| MemGUI-Bench | Pass@3 | 35.9 |
| MemGUI-Bench | IRR | 30.2 |
| MobileWorld GUI-Only | Success Rate | 17.9 |
On MemGUI-Bench, MemGUI-8B-SFT improves over the Qwen3-VL-8B-Instruct baseline and achieves the best open-data 8B performance reported in our experiments. On MobileWorld GUI-Only, it transfers beyond the source benchmark and reaches 17.9% success rate.
Dataset
MemGUI-3K contains 2,956 successful mobile GUI trajectories and 64,430 reasonable step-level training samples with ConAct annotations. The dataset includes full trajectories, screenshots, step-level reasonableness annotations, and multimodal training files.
Dataset page: https://huggingface.co/datasets/lgy0404/MemGUI-3K
Citation
@article{memguiagent2026,
title = {MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management},
year = {2026}
}
- Downloads last month
- -
Model tree for lgy0404/MemGUI-8B-SFT
Dataset used to train lgy0404/MemGUI-8B-SFT
Collection including lgy0404/MemGUI-8B-SFT
Evaluation results
- Pass@1 on MemGUI-Benchself-reported23.400
- Pass@3 on MemGUI-Benchself-reported35.900
- Information Retention Rate on MemGUI-Benchself-reported30.200
- Success Rate on MobileWorld GUI-Onlyself-reported17.900