Instructions to use lgy0404/MemGUI-8B-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lgy0404/MemGUI-8B-SFT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="lgy0404/MemGUI-8B-SFT")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("lgy0404/MemGUI-8B-SFT")
model = AutoModelForMultimodalLM.from_pretrained("lgy0404/MemGUI-8B-SFT")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use lgy0404/MemGUI-8B-SFT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lgy0404/MemGUI-8B-SFT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lgy0404/MemGUI-8B-SFT",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/lgy0404/MemGUI-8B-SFT

SGLang

How to use lgy0404/MemGUI-8B-SFT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lgy0404/MemGUI-8B-SFT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lgy0404/MemGUI-8B-SFT",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lgy0404/MemGUI-8B-SFT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lgy0404/MemGUI-8B-SFT",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use lgy0404/MemGUI-8B-SFT with Docker Model Runner:
```
docker model run hf.co/lgy0404/MemGUI-8B-SFT
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

MemGUI-8B-SFT

MemGUI-8B-SFT is an 8B MemGUI-Agent model trained from Qwen3-VL-8B-Instruct on MemGUI-3K. It is designed for long-horizon mobile GUI control with proactive context management.

The model follows the ConAct Context-as-Action protocol. At each step, it produces a structured response with reasoning, history folding, a UI or memory tool call, a grounded UI observation, and the next action intent. This allows the agent to manage three context fields while acting: Folded Action History, Folded UI State, and Recent Step Record.

Model Details

Model type: multimodal mobile GUI agent
Base model: Qwen/Qwen3-VL-8B-Instruct
Training data: lgy0404/MemGUI-3K
Training recipe: supervised fine-tuning with ms-swift
Output protocol: ConAct 5-part structured output
License: Apache 2.0

Intended Use

MemGUI-8B-SFT is intended for research on mobile GUI agents, long-horizon GUI control, context management, UI memory, and history folding. It can be used as an action policy in mobile GUI environments that provide screenshots and execute structured tool calls.

This model is not a general-purpose chatbot. It expects the MemGUI-Agent system prompt, a screenshot, and a structured mobile GUI context state.

Input and Output Format

The model expects a multimodal conversation with:

a system prompt defining the MemGUI-Agent tools and response format,
a user message containing <image> plus the task goal and structured context,
one screenshot image.

The assistant response follows this order:

<thinking>...</thinking>
<folding>{"range": [start_step, current_step], "summary": "..."}</folding>
<tool_call>{"name": "mobile_use", "arguments": {...}}</tool_call>
<ui_observation>...</ui_observation>
<action_intent>...</action_intent>

For the first step of a trajectory, <folding> is omitted because there is no previous step to fold.

Evaluation

Benchmark	Metric	Score
MemGUI-Bench	Pass@1	23.4
MemGUI-Bench	Pass@3	35.9
MemGUI-Bench	IRR	30.2
MobileWorld GUI-Only	Success Rate	17.9

On MemGUI-Bench, MemGUI-8B-SFT improves over the Qwen3-VL-8B-Instruct baseline and achieves the best open-data 8B performance reported in our experiments. On MobileWorld GUI-Only, it transfers beyond the source benchmark and reaches 17.9% success rate.

Dataset

MemGUI-3K contains 2,956 successful mobile GUI trajectories and 64,430 reasonable step-level training samples with ConAct annotations. The dataset includes full trajectories, screenshots, step-level reasonableness annotations, and multimodal training files.

Dataset page: https://huggingface.co/datasets/lgy0404/MemGUI-3K

Citation

@article{memguiagent2026,
  title = {MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management},
  year = {2026}
}

Downloads last month: -

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for lgy0404/MemGUI-8B-SFT

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(298)

this model

Quantizations

1 model

Dataset used to train lgy0404/MemGUI-8B-SFT

Collection including lgy0404/MemGUI-8B-SFT

MemGUI-Agent

Collection

MemGUI-Agent • 2 items • Updated about 21 hours ago

Evaluation results

Pass@1 on MemGUI-Bench
self-reported

23.400
Pass@3 on MemGUI-Bench
self-reported

35.900
Information Retention Rate on MemGUI-Bench
self-reported

30.200
Success Rate on MobileWorld GUI-Only
self-reported

17.900