Hugging Apps

community

AI & ML interests

None defined yet.

Recent Activity

multimodalart updated a collection 24 minutes ago

Community Spaces

multimodalart updated a collection 24 minutes ago

Community Spaces

multimodalart updated a collection 24 minutes ago

Community Spaces

View all activity

hugging-apps 's Spaces 62

PerceptionDLM Region Captioning

Parallel region captioning with multimodal diffusion LLM

FlowUpscaler

2x latent super-resolution with FlowUpscaler in Flux.2 space

SeFi-Image-5B-Base

Text-to-image with SeFi-Image-5B Semantic-First Diffusion

RefControl FLUX.2 Klein - Reference + Lineart

Keep identity from reference, follow lineart structure

MOSS-Music-8B-Thinking

Music understanding model for caption and analysis

Ex-Omni Talking Avatar

Text/speech to spoken response + 3D talking-avatar video

README

JoyAI Image Edit Plus

Multi-image instruction-guided image editing

Qwen3 Forced Aligner

Word-level timestamp alignment from audio + transcript

DomainShuttle Subject Video

Subject-driven text-to-video from reference images (Wan2.2)

SAM2Matting

Image matting with diverse prompts via SAM2Matting

MMDiff

Multi-modal generation with diffusion transformers

Anima ControlNet VACE Depth

Anima depth-conditioned image generation via VACE ControlNet

BS-Roformer Leap Audio Separator

Separate audio into vocals and instruments with BS-Roformer

Whisper Small Polish

Polish speech recognition with fine-tuned Whisper Small

Fast-FoundationStereo

Real-time zero-shot stereo disparity estimation

PhoneBuddy-4B

Phone-use GUI agent - screenshot + task to next action

VISTA GUI Grounding

GUI grounding with VISTA-9B — predict click coordinates

VIEW2SPACE-4B

Multi-view visual reasoning VLM based on Qwen3-VL 4B

MAOAM

Object and Material Selection VLM

KDL-Frontier-Parser-nano

Document-parsing VLM (1.2B) by KoreaDeep

Kokoro Vietnamese TTS

Vietnamese text-to-speech with Kokoro TTS

SenseNova-U1-8B-MoT-Interleaved

Interleaved text and image generation with SenseNova-U1

UniAR

Unified AR model for image understanding & generation