GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection
Abstract
GUI-CIDER is a mid-training method that explicitly incorporates GUI world knowledge through causal internalization and density-aware exemplar reselection to improve GUI agent performance.
Despite the rapid progress of multimodal large language models in building Graphical User Interface (GUI) agents, their real-world task completion is fundamentally bottlenecked by a lack of world knowledge about GUI operations. Existing solutions typically rely on expensive multi-agent scaffolding or conventional post-training paradigms, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). However, post-training only allows agents to implicitly absorb world knowledge through action annotations or reward signals, leading to inefficient trajectory memorization rather than genuine comprehension. Therefore, an approach that enables explicit learning of this knowledge is imperative. To this end, we propose GUI-CIDER, a mid-training method that explicitly internalizes GUI world knowledge through Causal Internalization and Density-aware Exemplar Reselection. GUI-CIDER operates in three stages: (1) data synthesis, which distills static planning and dynamic causal knowledge from GUI trajectories into text; (2) exemplar reselection, which filters the corpus by rewarding causal structures and penalizing semantic redundancy; and (3) mid-training, where the refined data is used to embed the acquired knowledge. Extensive experiments on two GUI knowledge benchmarks and three task completion benchmarks demonstrate that GUI-CIDER consistently improves both the agent's understanding of GUI operations and its task success rates.The codes are available at https://github.com/Wuzheng02/GUI-CIDER.
Community
GUI-CIDER is a mid-training method that explicitly incorporates GUI world knowledge through causal internalization and density-aware exemplar reselection to improve GUI agent performance.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Towards Scalable Lightweight GUI Agents via Multi-role Orchestration (2026)
- LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning (2026)
- Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining (2026)
- SE-GA: Memory-Augmented Self-Evolution for GUI Agents (2026)
- Faithful Mobile GUI Agents with Guided Advantage Estimator (2026)
- GUI Agents with Reinforcement Learning: Toward Digital Inhabitants (2026)
- Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.28534 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper