MemGUI-8B-SFT

MemGUI-8B-SFT is an 8B MemGUI-Agent model trained from Qwen3-VL-8B-Instruct on MemGUI-3K. It is designed for long-horizon mobile GUI control with proactive context management.

The model follows the ConAct Context-as-Action protocol. At each step, it produces a structured response with reasoning, history folding, a UI or memory tool call, a grounded UI observation, and the next action intent. This allows the agent to manage three context fields while acting: Folded Action History, Folded UI State, and Recent Step Record.

Model Details

  • Model type: multimodal mobile GUI agent
  • Base model: Qwen/Qwen3-VL-8B-Instruct
  • Training data: lgy0404/MemGUI-3K
  • Training recipe: supervised fine-tuning with ms-swift
  • Output protocol: ConAct 5-part structured output
  • License: Apache 2.0

Intended Use

MemGUI-8B-SFT is intended for research on mobile GUI agents, long-horizon GUI control, context management, UI memory, and history folding. It can be used as an action policy in mobile GUI environments that provide screenshots and execute structured tool calls.

This model is not a general-purpose chatbot. It expects the MemGUI-Agent system prompt, a screenshot, and a structured mobile GUI context state.

Input and Output Format

The model expects a multimodal conversation with:

  • a system prompt defining the MemGUI-Agent tools and response format,
  • a user message containing <image> plus the task goal and structured context,
  • one screenshot image.

The assistant response follows this order:

<thinking>...</thinking>
<folding>{"range": [start_step, current_step], "summary": "..."}</folding>
<tool_call>{"name": "mobile_use", "arguments": {...}}</tool_call>
<ui_observation>...</ui_observation>
<action_intent>...</action_intent>

For the first step of a trajectory, <folding> is omitted because there is no previous step to fold.

Evaluation

Benchmark Metric Score
MemGUI-Bench Pass@1 23.4
MemGUI-Bench Pass@3 35.9
MemGUI-Bench IRR 30.2
MobileWorld GUI-Only Success Rate 17.9

On MemGUI-Bench, MemGUI-8B-SFT improves over the Qwen3-VL-8B-Instruct baseline and achieves the best open-data 8B performance reported in our experiments. On MobileWorld GUI-Only, it transfers beyond the source benchmark and reaches 17.9% success rate.

Dataset

MemGUI-3K contains 2,956 successful mobile GUI trajectories and 64,430 reasonable step-level training samples with ConAct annotations. The dataset includes full trajectories, screenshots, step-level reasonableness annotations, and multimodal training files.

Dataset page: https://huggingface.co/datasets/lgy0404/MemGUI-3K

Citation

@article{memguiagent2026,
  title = {MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management},
  year = {2026}
}
Downloads last month
-
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lgy0404/MemGUI-8B-SFT

Finetuned
(298)
this model
Quantizations
1 model

Dataset used to train lgy0404/MemGUI-8B-SFT

Collection including lgy0404/MemGUI-8B-SFT

Evaluation results