Multimodal AI Agents

community

https://microsoft.github.io/Magma/

Activity Feed

AI & ML interests

None defined yet.

alvarobartt

posted an update 8 days ago

Post

257

Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker

alvarobartt

posted an update 12 days ago

Post

3261

Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!

TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.

🧠 hf-mem now splits MoE memory into base model weights, routed experts, and KV cache
🏗️ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
⚡ Active params isn't the same as memory footprint, especially for sparse architectures
📦 Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
📚 KV cache can still dominate depending on context length, batch size, and concurrency
🔀 Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
🚀 Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving

Check the repository at https://github.com/alvarobartt/hf-mem

alvarobartt

posted an update 3 months ago

Post

3742

Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥

> 🕒 60-minute single-pass processing, no chunking or stitching
> 👤 Customized hotwords to guide recognition on domain-specific content
> 📝 Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr

alvarobartt

posted an update 4 months ago

Post

3269

💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.

1 reply

jw2yang

updated a dataset about 1 year ago

MagmaAI/Magma-AITW-SoM

Viewer • Updated Apr 29, 2025 • 19k • 170 • 2

jw2yang

published a dataset about 1 year ago

MagmaAI/Magma-AITW-SoM

Viewer • Updated Apr 29, 2025 • 19k • 170 • 2

jw2yang

updated a dataset about 1 year ago

MagmaAI/Magma-Mind2Web-SoM

Viewer • Updated Apr 29, 2025 • 7.21k • 201 • 2

jw2yang

published a dataset about 1 year ago

MagmaAI/Magma-Mind2Web-SoM

Viewer • Updated Apr 29, 2025 • 7.21k • 201 • 2

jw2yang

updated a dataset about 1 year ago

MagmaAI/Magma-Video-ToM

Viewer • Updated Apr 12, 2025 • 2.21M • 921 • 4

jw2yang

published a dataset about 1 year ago

MagmaAI/Magma-Video-ToM

Viewer • Updated Apr 12, 2025 • 2.21M • 921 • 4

jw2yang

updated a dataset about 1 year ago

MagmaAI/Magma-OXE-ToM

Viewer • Updated Apr 6, 2025 • 6.13M • 1.24k • 3

jw2yang

published 2 datasets about 1 year ago

MagmaAI/Magma-OXE-ToM

Viewer • Updated Apr 6, 2025 • 6.13M • 1.24k • 3

MagmaAI/Magma-820K

Updated Mar 9, 2025 • 41 • 5

jw2yang

updated a dataset about 1 year ago

MagmaAI/Magma-820K

Updated Mar 9, 2025 • 41 • 5

alvarobartt

posted an update over 1 year ago

Post

3647

🔥 Agents can do anything! @microsoft Research just announced the release of Magma 8B!

Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!

Magma comes with exciting new features such as:
- Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning
- Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning
- A strong generalization and ability to be fine-tuned for other agentic tasks
- SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning
- Generates goal-driven visual plans and actions for agentic use cases

Model: microsoft/Magma-8B
Technical Report: Magma: A Foundation Model for Multimodal AI Agents (2502.13130)

jw2yang

authored 5 papers over 1 year ago

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Paper • 2501.05452 • Published Jan 9, 2025 • 15

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Paper • 2412.10345 • Published Dec 13, 2024 • 2

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Paper • 2412.09585 • Published Dec 12, 2024 • 11

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published Dec 5, 2024 • 62

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Paper • 2410.10818 • Published Oct 14, 2024 • 16

AI & ML interests

Team members 2

MagmaAI's activity